Why Ridits?

by Richard in 
statistics

Today I want to discuss the ridit transformation. Ridits were introduced by Bross (1958) as a way to deal with ordinal categorical data. The ridit of an observation is defined “relative to an identified distribution” which I will call the reference distribution.

Say the the reference distribution has $P(X=x) = p_x$. Then the ridit of an observation in category $t$ is defined by

\[r_t = \sum_{u < t} p_u + p_t/2\]

As an example, Bross provides a table which classifies auto accident injuries into the categories None, Minor, Moderate, Severe, Serious, Critical and Fatal. The reference distribution is given by the following table:

None Minor Mod Sev Ser Cri Ftl
17 54 60 19 9 6 14

So, for example, the ridit of a Serious injury is

\[\begin{align*}17/179 + 54/179 + 60/179 + \\ 19/179 + (9/179) \times 1/2 = 0.863.\end{align*}\]

The reason why I am interested in ridits is because I was recently intorduced to them as a way of including ordinal data in a regression model, particularly in the case of logistic regression or neural network models.

Compared to one-hot encoding, ridits have the obvious advantages of not introducing extra variables and respecting the fact that the variable is ordered.

What I wondered about ridits was: why not just take the cumulative density. In other words, why not just define the ridit to be $\sum_{u\le x}p_u$? Bross does not really explain this in his paper. A follow-up was promised but apparently never published (perhaps it did not get accepted for publication.)

Later, Brockett and Levine provided some axioms which lead to ridits, but they seem like rather a stretch. Certainly, Bross did not start with some axioms and then arrive at the ridit formula.

After some thought, I think I finally understand the reason for the definition chosen by Bross.

If we look at the usual cdf of a variable $X$, it has the property that $F_X(x)$ is uniformly distributed and has mean $1/2$. However, in the discrete case this is no longer true. The addition of $p_t/2$ in the ridit formula forces the mean of the ridits to be exactly $1/2$. This can be seen as follows:

\[\begin{align*}E[r_t] &= \sum_t (\sum_{u < t} p_u + p_t/2)p_t\\ &= \sum_{u < v}p_u p_v + \frac{1}{2}\sum_t p_t^2\\ &= \frac{1}{2}\sum_{u \neq v}p_u p_v + \frac{1}{2}\sum_{u=v}p_u p_v\\ &= \frac{1}{2}\sum_u \sum_v p_u p_v\\ &= \frac{1}{2}(\sum_u p_u )(\sum_v p_v)\\ &= 1/2\end{align*}\]

Bross knew that the ridits have this property, and clearly it is important to him because he mentions it in his paper, but I have no idea how he arrived at the formula in the first place. (Perhaps he experimented with writing the cumulative distribution forwards and backwards, and realised that the values balanced each other out.)

I am also not sure why having mean exactly 0.5 is important. Perhaps it makes hand calculation easier, or guarantees that the distribution of ridits looks closer to uniform than it otherwise would.

Oddly, Brockett and Levine do not seem to mention the fact that the mean of the ridits is 0.5 in their paper. I’m not sure why, since I think they overlooked the most important property of ridits!

© Data Science Confidential 2020 All rights reserved.