| Home | This Week | Contents | Search | Group | News | Wiki | Portal | Feedback | Contact

 

Syllabus
Schedule
Professor
Software
Data
FYA

Handout
Normalizing Variables


This is a discussion of how to normalize (aka standardize) variables. The point of normalization is to make variables comparable to each other. The reason this is a problem is that measurements made using such scales of measurement as nominal, ordinal, interval and ratio are not unique. For example, you can measure temperature in both Fahrenheit and Centigrade. Both are valid, but they produce different numbers. If you want to know whether it is warmer in Seattle or Paris on a given day, and one is 68 degrees Fahrenheit and the other is 25 degrees Centigrade, you can't just say 68 is bigger than 25 so Seattle is warmer. Instead, you need to reduce the measurements to the same scale, and then compare. Normalization is the process of reducing measurements to a "neutral" or "standard" scale.

Normalizing is done differently depending on the level of measurement of the variables, and is intimately related to the uniqueness properties of the measurement level. See the handout on measurement theory for more information.

Nominal Scale Variables

  • To compare two nominal variables that may be measured using different scales, you would like to "normalize" the values so you can see how well they correspond to each other. There is no simple normalization technique to do this, but it can be done.
    • One approach: Construct a contingency table to cross-classify observations of one variable against the other. Then, if all observations falling into a given category or the row variable fall into just one category of the column variable, you can establish a 1-1 mapping of one measurement to the other. i.e., in variable A, a "2" corresponds to a "92" means in variable B
    • More sophisticated approach: use a technique called correspondence analysis (particularly a variant called optimal scaling) to work out a set of scores that maximize correspondence

Ordinal Scale

Object M1 M2 M3
A 22 0 99
B 22 0 99
C 22 0 99
D 23 1 150
E 24 67 152
  • To normalize an ordinal scale, you convert the values to rank order values, for example, normalizing each of the scales above would yield:
 
Object M1* M2* M3*
A 1 1 1
B 1 1 1
C 1 1 1
D 2 2 2
E 3 3 3
  • By normalizing variables, you can see whether a set of measured variables are really measuring the same thing. i.e., you take away numerical differences that are arbitrary (due to different measurement properties) and leave only the differences that reflect differences in the underlying property being measured.
  • Note: we tend to use an asterisk after a variable name to indicate the normalized version of the variable

Interval Scale

  • Uniqueness. Interval scales are unique up to a linear transformation (Y = mX+b). In other words, if you measure a set of objects on an interval scale, and then multiply and/or add a constant to each value, the resulting values are equally as valid as the original values. This is because the ratios of the intervals between the numbers are not affected by linear transformations. The following measurements are equally valid:
Object M1 M2 M3 M3
A 22 32 220 230
B 22 32 220 230
C 22 32 220 230
D 23 33 230 240
E 24 34 240 250
  • To normalize an interval scale, you perform a linear transformation that creates a normalized version of the variable with the property that the mean is zero and the standard deviation is one. This linear transformation is called standardizing or reducing to z-scores. Normalizing each of the variables above would yield:
Object M1 M2 M3 M4
A -.75 -.75 -.75 -.75
B -.75 -.75 -.75 -.75
C -.75 -.75 -.75 -.75
D 0.50 0.50 0.50 0.50
E 1.75 1.75 1.75 1.75
  • Note that all the values are the same -- this indicates that all four columns are just linear transformations of each other and therefore, from an interval scaling point of view, say exactly the same thing.
  • Note all also that the standardized values can be interpreted as (standard) deviations from the mean. D is just slightly above the mean of all objects on this variable, while E is quite a bit higher than the mean.

Ratio Scale

  • Uniqueness. Ratio scales are unique up to a congruence or proportionality transformation (Y = mX). In other words, if you measure a set of objects on a ratio scale, and then multiply each value by a constant, the resulting values are equally as valid as the original values. This is because the ratios of the intervals between the numbers are not affected by congruence transformations. The measurements M1, M2 and M3 are equally valid measures of given object property, but M4 is not measuring the same thing:
Object M1 M2 M3 M4
A 22 220 11 12
B 22 220 11 12
C 22 220 11 12
D 23 230 11.5 13
E 24 240 12 14
  • To normalize a ratio scale, you perform a particular "congruence" or "similarity" transformation that creates a normalized version of the variable with the property that the length of the vector is 1 (i.e., the Euclidean or L2 norm equals 1.0). In other words, to normalize a ratio-scaled variable, we divide each value of the variable by the square root of the sum of squares of all the original values. Normalizing each of the variables above would yield:
Object M1 M2 M3 M4
A 0.44 0.44 0.44 0.43
B 0.44 0.44 0.44 0.43
C 0.44 0.44 0.44 0.43
D 0.45 0.45 0.45 0.46
E 0.47 0.47 0.47 0.50
  • Note that all the values except the last column are the same -- this indicates that the first three columns are just rescalings (in a ratio sense) of each other and therefore, measure exactly the same thing. The last column is different however, indicating that it measures something else.
  • Note also that other ways of normalizing accomplish the same goal of making different measurements comparable. So we could just divide each column by the column sum, creating a new variable whose values add to 1. This allows interpretation of the rescaled values as proportions or shares of the whole. This is not the usual way but it works fine.

Difference Scale (aka Additive Scale)

  • Uniqueness. Additive scales are unique up to a "translation" transformation (Y = X + b). In other words, if you measure a set of objects on an additive scale, and then add a constant to each value, the resulting values are equally as valid as the original values. This is because the intervals between values are not affected by translation transformations. The measurements M1 and M2 are equally valid measures of given object property, but M3 is not measuring the same thing:
Object M1 M2 M3
A 22 12 11
B 22 12 11
C 22 12 11
D 23 13 11.5
E 25 15 12
  • To normalize an additive scale, you perform a particular translation transformation that creates a normalized version of the variable with the property that the mean of the transformed vector is 0. To do this, we just subtract the mean of the original values. Normalizing each of the variables above would yield:
Object M1 M2 M3
A -0.8 -0.8 -0.3
B -0.8 -0.8 -0.3
C -0.8 -0.8 -0.3
D 0.2 0.2 0.2
E 2.2 2.2 0.7
  • Note that all the values except the last column are the same -- this indicates that the first two columns are just rescalings of each other and therefore, say exactly the same thing. The last column is different however, indicating that it measures something else.
  • Note also that other ways of normalizing accomplish the same goal of making different measurements comparable. So we could just subtract the column sum from each value

Absolute Scale

  • Uniqueness. Absolute scales are unique up to an identity transformation (Y = X). In other words, they are completely unique and no (non-trivial) transformation of the numbers is permissible.
  • As a result of their uniqueness, no normalization of absolute-scaled variables is needed (nor exists).
 
 

Visits: 

Hit Counter