Saturday, March 24, 2007

The Simple Test and Complex Phenomena

Written after taking the test described by Will Thalheimer. If you want to try the test for yourself, try it here. Via Marc Oehlert, who says, "Honestly, your score on this is probably as good an indicator of your performance in this field as any certification program going." Either he thinks very poorly of certification programs, or he did not read Thalheimer's analysis.

Well, I got 2 out of 15 correct. That is substantially worse than the average, which is, as you (Thalheimer) point out, barely above what they would get from pure guesswork.

(Actually, the 32 percent is about exactly what you would expect. It's an old adage among trivia game players: 'when in doubt pick 3' (ie., C, the middle response). And this quiz fits true to the pattern: A was the correct response 2 times, B 4 times, C 6 times, D 2 times, E none, and F once.)

All this is a round-about way of saying: have you considered the possibility that it's the quiz that's the problem, not the quiz-takers?

I mean - I went into the test with the expectation that I might not do well. I have a healthy doubt of my own abilities. But I am not a 2 out of 15 in my own field. That's an unreasonable result.

There is, in my view, a systematic flaw in this test. And it can be expressed generally as the following:

The test author believes (based on some research, which is never cited) that "Learning is better if F" where 'F' is some principle, such as "Performance objectives that most clearly specify the desired learner behavior will produce the best instructional design."

This principle is treated as linear. That is to say, the more the principle is exemplified in the answer (per the author's interpretation) the more learning will be better.

But these principles are not linear. There is a point of diminishing returns. There is a point at which slavish adherence to the principle produces more problems than good. Experienced designers understand this, and hence build some slack into the application of the principles.

Question 1 provides a good object lesson:

The feedback states: "Performance objectives that most clearly specify the desired learner behavior will produce the best instructional design."

Option B (which I selected) is: “As each web page is developed, and after the full website is developed, each web page should be tested in both Netscape Navigator and Internet Explorer.”

Option C (which is considered correct) is: Same as B, with the addition of the following: “One month after completing the training, learners should test each web page during its development at least twice 90 percent of the time, and test each web page once after the whole website is complete at least 98 percent of the time.”

Now the question is, is the performance objective "more clearly stated" in C than in B? According to the author (obviously) it is. But sometimes making things more precisely stated does not make them more clear. It does not even make them more precise.

Which is clearer:

a. Test the page after design

b. Test the page 98 percent of the time after design

In my view, (a) is clearer.

Moreover, (b) is no more precise than (a). Because what (a) means is "Test the page 100 percent of the time after design".

Therefore, it would be unreasonable to select (c) on the ground that it is clearer. The unthinking effort to make it more precise went over the top and resulted in a statement that is more an example of nonsense than clarity.

The entire test is constructed this way. I got a couple where it was pretty obvious what the examiner was looking for. But otherwise, I picked what I felt was the best answer, which in every case was the less extreme version of the over-the-top choice.

In question number 2, for example, the principle is: "When the learning and performance contexts are similar, more information will be retrieved from memory."

Well, this is generally true. But will somebody prepare better spending a week on the road, living in a hotel, unable to keep up with work at home in Boston or to be there to help the kids? Being on the road creates an impact. So even if the test is being conducted in San Francisco, the comes a point where the advantage of studying and testing in a similar environment is overwhelmed by the disadvantage of being on the road.

The test author created an extreme case - a test location in San Francisco instead of a test location in downtown Boston. Thus, complications that an experienced person would automatically take into account - the time lost in airports, the rigors of travel, etc. - are built into their thinking.

The only way to get through such questions is to be able to figure out what the author is looking for. In this case, I looked at the example and it was pretty clear that it would be based on 'similarity of environment' and not any real question about 'effective learning'. It was one of the two I got right.

But author's intention is very deliberately disguised throughout the test. Or more accurately, the test addresses such a specific context that only people who work in that specific context have any real chance of divining the author's intent (and as it turns out, the context was so narrow it didn't even show up statistically).

This, I think, is one of the problems of testing generally, and not just this test in particular.

In a test like this, each question is designed to measure only one point of learning (more precisely: to measure responses only along one vector). Theoretically, you could have questions that measure more than one vector, but it results in confusing questions and too many possible responses.

If the test measures simple things, that's fine. The question of whether 2+2=4 is not going to be impacted by external considerations.

But if the test measures complex phenomena, then it is going to systematically misrepresent the student's understanding of the phenomena.

Specifically, a very simple one-dimensional understanding will fare as well (and in this case, better) than a complex, nuanced understanding. People who understand a discipline as a set of one-dimensional principles will do the best - understanding simply becomes a case of picking which principle applies, then selecting the example that fits the best.

This test fails because it is too narrowly defined to let the simple understanders spot the principle being defined, and too dependent on single principles to give people who genuinely understand the phenomena any advantage.

The test author is right: don’t trust gurus.

Unfortunately, the test author didn't consider the possibility of recursion.

1 comment:

I welcome your comments - I'm really sorry about the moderation, but Google's filters are basically ineffective.