Measurement is the assignment of numbers to objects in such a way that physical relationships and operations among the objects correspond to arithmetic relationships and operations among the numbers.
Both objects and relationships are mapped: All measurement is a form of modeling; it embodies a primitive theory of how the objects work.
Different levels of measurement are defined by the number and kind of correspondences that hold between the physical relations among objects and the arithmetic relations among the scores.
But not all mathematical relationships among measured values have a counterpart in physical operations. For example, the breakdown of an object's weight into its prime multipliers doesn't say anything about the objects themselves. Similarly, if the weight of one object happens to be the log of the weight of another, this does not imply any special relationship among the objects.
The different levels of measurement are distinguished by which arithmetic operations on the measured scores have counterparts in physical operations on the objects. To put it another way, levels of measurement are distinguished by which arithmetic operations on the scores are meaningful. In fact, in a deep sense, the theory of measurement is a theory of meaningfulness -- this will become more clear as we go along! As an aside I would point out that any time we talk about meaning, we are talking about some kind of mapping from one system to another. To understand something is to build a model of it; it is to translate it into different terms.
The rules that distinguish different scales of measurement also define the uniqueness of measurement: the number and kind of alternative measurements that are valid. What i'm talking about is the existence of things like alternative units. For example, if I can measure height in meters (i'm 1.73 meters tall), then I can also measure it in centimeters (173) or inches (68). These are all equally valid. This also will become clearer as we go along.
A function is a mapping of the objects in one set to the objects in another. The two sets can be the same. For example, the function Y = X2 is a mapping of real numbers to real numbers. We use the notation f(x) = x2 to define a function. You can think of f(x) as saying "a function of x". The right hand side of the equation defines exactly what function of x it is.
If we are measuring the height of different persons, we might represent the measured height of Steve as f(Steve), which might have the value f(Steve) = 68. To refer to the height of someone who is only half as tall as Steve, we might write f(Joe) = 0.5*f(Steve).
In nominal measurement, we assign numeric scores in such a way that only equality of scores has meaning for the attribute being measured. For example, consider measuring weight on a nominal scale. Suppose that in my system of measurement, my weight is assigned a "12". Now suppose that we compare my weight to yours. If you also were assigned a 12, this would mean that we both had the same weight. But if you had any other score, such as 15, there is nothing we could say: we could not say that you weigh more than me. So in nominal scale measurement, the only physical property preserved or captured by the numeric scores, is equality:
x is the same weight as y | if and only if | f(x) = f(y) |
Note the tremendous lack of uniqueness of measurement scales: any other number system that preserved that property would be just as good, and there are an infinite number of them.
In order to tell whether two sets of measurements are the same, you need to recode both of them so they use the same codes, then compare them: the fact that they initially have different values doesn't mean anything.
In ordinal measurement, we assign numeric scores in such a way that not only equality of scores but ordinality of scores have meaning for the attribute being measured. For example, let us measure weight on an ordinal scale. Suppose that in my system of measurement, my weight is assigned a "12". Now suppose we compare my weight to yours. If you also were assigned a 12, this would mean that we both had the same weight. So far, that's the same as in nominal measurement.
But if you had a weight of 24, this would mean not only that we have different weights, but that you weigh more than I do, because 24 is bigger than 12. However, we can't say how much more you weigh than I do. For example if A weights 12 and B weighs 16 and C weighs 24, we cannot say that the difference in weight between A and B is half of what the difference between B and C is. All we know is that C weighs the most, A weighs the least, and B is in between.
So in an ordinal scale, the only physical properties preserved or captured by the measured scores is equality and ordinality:
x is the same weight as y | if and only if | f(x) = f(y) |
x weighs more than y | if and only if | f(x) > f(y) |
Again, any method of assigning numeric scores that satisfies these two rules is a valid ordinal measurement. This means that ordinal measurements are unique only up to a monotone transformation.
In order to tell whether two sets of measurements are the same, you need to rank order both sets and then compare them: the fact that they initially have different values doesn't mean anything.
In interval measurement, we assign numeric scores in such a way that not only equality and ordinality of scores have meaning, but also the intervals between the scores. For example, let us measure weight on an interval scale. Suppose that in my system of measurement, my weight is assigned a "12". Now suppose we compare my weight to yours. If you also were assigned a 12, this would mean that we both had the same weight. But if you had a 24, this would mean that we not only had different weights, but that you weigh more than I do. So far, this is the same as ordinal measurement. But here is the different part: if A weighs 12 and B weighs 16 and C weighs 24, we can say that the difference in weight between A and B is half of what the difference between B and C is. The differences between scores have meaning now.
However, we still can't say that if f(A) is 12 and f(C) is 24, that C weighs twice as much as A. As proof, consider temperature of two cities, measured in degrees fahrenheit. city A is 80 degrees and city B is 40 degrees. We are tempted to say city A is twice as hot. But suppose that instead we measure the temperature in centigrade. You know that to get from fahrenheit to centigrade we subtract 32 and multiply by 5/9. So city A is 27 degress centigrade. City B is 4 degrees centigrade. It no longer appears to be twice as hot: now it looks more like 7 times as hot. Now you know both centigrade and fahrenheit are equally valid measuring scales of temperature, yet they are giving really different impressions of the relative temperature of these two cities! The problem is that they are interval-scale measures of temperature, and it is not meaningful to say that something is twice something else when you measure it on an interval scale!
So in interval scale measurement, the only physical properties preserved or captured by the measured scores are equality, ordinality, and interval ratios:
x is the same weight as y | if and only if | f(x) = f(y) |
x weighs more than y | if and only if | f(x) > f(y) |
the difference in weight bet. A and C is k times as big as the difference bet. A and B |
if and only if | k = (f(a)-f(c))/(f(a)-f(b)) |
Any method of assigning numeric scores that satisfies these three rules is a valid interval measurement, and this turns out to mean that interval measurements are unique up to a linear transformation of the values. A linear transformation is one that looks like this: g(x) = m*f(x) - b where m and b are constants. The transformation from fahrenheit to centigrade is like this:
C = (F-32)*5/9,
C = 5/9F - 17.7.
In order to tell whether two sets of interval measurements are the same, you need to standardize both of them and then compare them: the fact that they initially have different values doesn't mean anything. So standardize an interval measurement, we subtract mean value and divide by the standard deviation.
In ratio-scale measurement, we assign numeric scores in such a way that not only equality and ordinality and the intervals between the scores have meaning, but also ratios of the scores. For example, let us measure weight on a ratio scale. Suppose that in my system of measurement, my weight is assigned a "12". Now suppose we compare my weight to yours. If you also were assigned a 12, this would mean that we both had the same weight. But if you had a 24, this would mean that we not only had different weights, but you weigh more than I do. Furthermore, if A weighs 12 and B weighs 16 and C weighs 24, we can say that the difference in weight between A and B is half of what the difference between B and C is. The difference between numbers has meaning. And, if your measured weight is 24 and my measured weight is 12, then we can say that you weigh twice as much as I do (finally!).
So in ratio scale measurement, the following properties are preserved by the measured scores:
x is the same weight as y | if and only if | f(x) = f(y) |
x weighs more than y | if and only if | f(x) > f(y) |
the difference in weight bet. A and C is k times as big as the difference bet. A and B |
if and only if | k = (f(a)-f(c))/(f(a)-f(b)) |
x weighs k times as much as y | if and only if | k = f(x)/f(y) |
Any method of assigning numeric scores that satisfies these four rules is a valid ratio-scale measurement, and this turns out to mean that ratio measurements are unique up to a congruence or proportionality transformation of the values. A congruence transformation is one that looks like this: g(x) = m*f(x) where m is a constant. For example, we can get from inches to centimeters by this equation:
C = 2.54*I
In order to tell whether two sets of measurements are the same, you need to normalize both of them and then compare them: the fact that they initially have different values doesn't mean anything. To normalize ratio variables, we divide each value by the square root of sum of squares of all values.