Outline

Measurement error

Validity
Reliability

Scales of measurement

Lab 3

Measurement

In this lab you will go through a number of activities designed to give you a feel for several issues related to measurement in research. In psychology measure is particularly tricky because often the things (variables) of interest aren't directly observable. So we'll start today's lab with some non-psychological examples of measure that are directly observable. Some of the activities may be done in small groups, others should be done individually. Your responses, even in the group work, should reflect your individual answers (e.g., so while you do the activity as a group, each of you need to write up your own answers to the questions). You can find the work sheet for today's lab in assignment #3 (see the ReggieNet side menu). The worksheet is a Word document. Type your answers into it, save it and then upload the file using the attach function in assignment 3.

Key concepts:
Validity: Does the measurement accurately measure what it is intended to? How "good" is the measurement (in terms of how much error it has)?
Reliability: Do you get the same thing with multiple measurements?

Scales of measurement: What do the measurements correspond to?
    - Nominal: named categories
    - Ordinal: ordered categories
    - Interval: ordered categories of the same size
    - Ratio: ordered categories of the same size with a true absolute zero point

Group exercise (work in groups of 3 or 4 students):

Task I: Measuring personal characteristics.

Measure height of an individual in the group
step 1: Pick one person in your group.
step 2: each person must come up with their own way of measuring how tall the volunteer is (including the volunteer him/herself)
step 3: Compare your measurements. What are the pros and cons of each method? What scale of measurement did each of you use (i.e. nominal, ordinal, interval, or ratio)? Which was the most valid measurement (and why)? Which was the most reliable measurement (and why)?

Measure the shoe size of somebody in the group
step 1: Pick one person in your group.
step 2: using the ruler#1 provided (download and print out the paper Lab 3 rulers. The one with large units is ruler#1), each person should measure to the nearest tenth the length of the person's shoe (don't tell other members of group your measurement until all have measured).
step 3: repeat step 2 using the ruler#2 with smaller units (again to the nearest tenth).
step 4: compare all the measurements that the group made with the two rulers. Which ruler resulted in the greatest differences in the measurements? Speculate why.

Download Lab 3 Rulers

Measure hair color of individuals in the group. Discuss what it means to "measure hair color." Can you use numbers? How else could you do it? Look around the entire classroom. How many categories of hair color do you see? As a group, discuss what needs to be considered in defining categories of hair color.

Task II: Measuring indirectly observable characteristics

Suppose that you are researchers interested in studying factors that impact how extroverted ("out going") people are. To investigate this imagine that decide to develop an instrument to measure the "out goingness" of each student in your lab (we won't actually collect any data, just think about how we would do it). Discuss how you would go about developing an instrument to measure this character trait. What observations/measurements would you make? What would your concerns be about validity and reliability be? What scale of measurement would you use?

Individual exercises

Task III: Variability, Validity & Reliability in Measurement

Measurement 1: Click on each line button. Estimate (that is make a guess, DON'T measure it with a ruler or anything else), to the nearest tenth of an inch, the length of the line. Record your answer on your worksheet.

Measurement 2: Click on each line button. Estimate, to the nearest tenth of an inch, the length of the line. Record your answer on your worksheet.

When you've finished recording your estimates into your worksheet, highlight the table below and copy the actual line lengths into the worksheet as well.

Actual Line Lengths
highlight the table to see the lengths of the lines

Measurement 1 Line 1
2.2 inches Line 2
2.6 inches Line 3
1.6 inches Line 4
3.4 inches Line 5
1.0 inches

Measurement 2 Line 1
1.6 inches Line 2
1.0 inches Line 3
2.6 inches Line 4
2.2 inches Line 5
3.4 inches

How accurate were your measurements? To find out subtract the actual values from your estimates (see the tables in the worksheet).

Add up the differences (in the 'total' boxes, make sure to keep track of negative and positive numbers). Then add up the three totals in the grand total box and divide this by 10 (the total number of measurements). This final number is your average measurement error for your estimates; in other words, a measure of the accuracy of your estimates.

Now let's compare our measurements for 1 and 2. These were the same lines, you measured them twice in two different orders. So we can get an idea of how reliable your estimates were by looking at the difference between your measurements. Match up the lines of the same length using the chart below. Calculate the difference in your measurements for each line length. Add up these differences and divide by 5. This is your average random error of measurement.
Note: You measured the lines in different orders the two times, so we need to match up the orders in our calculations.

measurement 1 line 1 line 2 line 3 line 4 line 5

measurement 2 line 4 line 3 line 1 line 5 line 2

Task IV: Scales of Measurement (and SPSS)

The classic taxonomy for discussing different types of measurement scales was proposed by Stevens (1946). Other taxonomies exist but Stevens’ system is familiar to everyone who has been trained in the social sciences. In Stevens’ system, there are four types of scales, Nominal, Ordinal, Interval, and Ratio, which can remembered using the acronym NOIR (french for black).
NOIR

Nominal Variables

Nominal variables can take on any kind of value, including values that are not numbers. The values must constitute a set of mutually exclusive categories. For example, if I have a set of data about college students, I might record which major each person has. The variable college major consists of different labels (e.g., Accounting, Mathematics, and Psychology). Note that there is no true order to college majors, though we usually alphabetize them for convenience. There is no meaningful sense in which English majors are higher or lower than Biology majors. Nominal values are either the same or they are different. They are not less than or more than anything else.

Examples of nominal variables

Biological sex {male, female}
Race/Ethnicity {African-American, Asian-American,...}
Type of school {public, private}
Treatment group {Untreated, Treated}
Down Syndrome {present, not present}
Attachment Style {dismissive-avoidant, anxious-preoccupied, secure}
Which emotion are you feeling right now? {Happiness, Sadness, Anger, Fear}

Ordinal Variables

Like nominal variables, ordinal variables are categorical. Whereas the categories in nominal variables have no meaningful order, the categories in ordinal variables have a natural order to them. For example, questionnaires often ask multiple-choice questions like so:

I like chatting with people I do not know.
Strongly disagree
Disagree
Neutral
Agree
Strongly agree

It is clear that the response choices have an order to them. Note, however, that there is no meaningful distance between the categories. Is the distance between strongly disagree and disagree the same as the distance between disagree and neutral? It is not a meaningful question because no distance as been defined. All we can do is say is which category is higher than the other.

Examples of ordinal variables

Dosage {placebo, low dose, high dose}
Order of finishing a race {1st place, 2nd place, 3rd place,...}
ISAT category {below standards, meets standards, exceeds standards}
Apgar score {0,1,...,10}

Interval scales

Interval scales are quantitative. The values that interval scales take on are almost always numbers. Furthermore, the distance between the numbers have a consistent meaning. The classic example of an interval scale is temperature on the Celsius or Fahrenheit scale. The distance between 25° and 35° is 10°. The distance between 90° and 100° is also 10°. In both cases, the difference involves the same amount of heat.

Unlike with nominal and ordinal scales, we can add and subtract scores on an interval scale because there are meaningful distances between the numbers.

Interestingly, the meaning of 0°C (or 0°F) is not what we are used to thinking about when we encounter the number zero. Usually, the number zero means the absence of something. Unfortunately, the number zero does not have this meaning in interval scales. When something has a temperature of 0°C, it does not mean that there is no heat. It just happens to be the temperature at which water freezes at sea level. It can get much, much colder. Thus, interval scales lack a true zero.

Lacking a true zero, interval scales cannot be used to create meaningful ratios. For example, 20°C is not "twice as hot" as 10°C. Also, 110°F is not “10% hotter” than 100°F.

Nearly interval scales

In truth, there are very, very few examples of variables with a true interval scale. However, a large percentage of variables used in the social sciences are treated as if they are interval scales. It turns out that with a bit of fancy math, many ordinal variables can be transformed, weighted, and summed in such a way that the resulting score is reasonably close to having interval properties. The advantage of doing this is that, unlike with nominal and ordinal scales, you can calculate means, standard deviations, and a host of other statistics that depend on there being meaningful distances between numbers.

Psychological and educational measures regularly make use of these procedures. For example, on tests like the ACT, we take information about which questions were answered correctly and then transform the scores into a scale that ranges from 1 to 36. As a group, people who score a higher on the ACT tend to perform better in college than people who score lower. Of course, many individuals perform much better than their ACT scores suggest. An equal number of individuals perform much worse than their ACT scores suggest. Among many other things, thirst for knowledge and hard work matter quite a bit. Even so, on average, individuals with a 10 on the ACT are likely to perform worse in college than people with a 20. Roughly by the same amount, people with a 30 on the ACT are likely to perform better in college than people with a 20. Again, we talking about averages, not individuals. Every day, some people beat expectations and some people fail to meet them, often by wide margins.

Examples of interval scales

Truly interval:
Temperature on the Celsius and Fahrenheit scale (not on the Kelvin scale)
Calendar year (e.g., 431BC, 1066AD)
Notes on an even-tempered instrument such as a piano {A, A#, B, C, C#, D, D#, E, F, F#, G, G#}
A ratio scale converted to a z-score (more on this later in the semester) metric (or any other kind of standard score metric)
Nearly interval:
Most scores from well-constructed ability tests (e.g., IQ, ACT, GRE) and personality measures (e.g., self-esteem, extroversion).

Ratio Scales

A ratio scale has all of the properties of an interval scale. In addition, it has a true zero. When a ratio scale has a value of zero, it indicates the absence of the quantity being measured. For example, if I say that I have 0 coins in my pocket, there are no coins in my pocket. The fact that ratio scales have true zeroes means that ratios are meaningful. For example, if you have 2 coins and I have one, you have twice as many coins as I do. If I have 100 coins and then you give me 10 more, the number of coins I have has increased by 10%.

Examples of Ratio Scales

Ratio scales involve countable quantities, such as:

coins
marbles
computers
speeding tickets
pregnancies
soldiers
planets

Many physical properties are also ratio scales, such as:

distance
mass
force
heat (on the Kelvin scale)
pressure
voltage
acceleration
proportions

These dimensions are not discrete countable quantities like cars and bricks but are instead continuous quantities that can be measured with decimals and fractions.

Notice that even though ratio variables have a true zero, on some of them it is possible to have negative numbers. For example, negative acceleration would indicate a slowing down. A negative value in a checking account means that you owe the bank money.

In the social sciences, there are many examples of ratio scales:

Income
Age
Years of education
Reaction time
Family size
Hours of study
Percentage of household chores completed (compared to other members of the household)

Consider the following measurement scales. For each kind of scales indicate which kind of scale of measurement you think would be most appropriate (Nominal, Ordinal, Interval, and Raito). Type your answers and rationale into your Assignment file.

1. Family size: 1 child, 2 children, 3 children, ...
2. Customer satisfaction: Poor, Fair, Good, Excellent
3. Height measured by questionnaire: "I am: very short, short, average, tall, very tall
4. Height measured by tape measure (in inches)
5. Cola brands in rank order of preference
6. Reaction time measured in milliseconds
7. Zip Codes: 61548, 61761, 62461, 47424, 65233
8. Age in years

Scales of Measurement in SPSS

level of measurement

Consider the Datafile that you created in Lab2 (last week). What scale of measurement is used to measure the variables is specified in the Variable View Measure column.

Measure - This column specifies the variable's scale of measurement. The three options are Scale, which covers both interval and ratio scales, Ordinal and Nominal. SPSS treats ratio and interval scales in the same mathematical way, so these are specified as "Scale" variables.

So in the end your variable view of the file that you created last time should have looked something like this.