Item Analysis

April 7, 2009 at 2:00 am Leave a comment

I was lucky enough to be able to spend some time today with Sharon Shrock and Bill Coscarelli, authors of the book Criterion-Referenced Test Development (3rd Edition) and I was reminded of the first time that I understood the basics of item analysis.  I wanted to share some knowledge with you that others have found useful.

The following discussion assumes a four choice multiple choice question/item as that makes it easier to explain and understand, however, it can be applied to the “outcome” of the participant answering the question, even if the item had tens of potential outcomes.  For example if you had a drag and drop item, or a multiple response item, there would a limited number of responses (even if that number were large) and thereby a limited number of outcomes. 

Before I go off on a tangent we’ll focus on a multiple choice item with four choices for this explanation.

Let’s imagine that we presented an item and all of the answers were equally right or equally confusing and all participants had to resort to guessing we’d end up with 25% of participants selecting each choice.  If there were more choices we’d have a smaller percentage guess the right answer.  So we have a guessing factor in a four choice multiple choice question of 25%. 


The table above illustrates this showing the four choices (A, B, C and D), the number of respondents that selected that choice (1,000 selecting each choice) and the resultant percentage being calculated in the third column.

In the real world we’d hope that people don’t have to resort to guessing (although some people have to) and so we might see resultants more like this:


These results were taken from an actual item delivered in a test by a Questionmark customer and provided for me to use as an example.  The right choice was C and I’ve highlighted this to make it easy to follow.

From the percentage we can determine the Difficulty or “P value” which is simply the percentage of people selecting the right choice expressed as a real number; so 64% becomes 0.64. 


Now let’s imagine that everyone selected the right choice we’d have 100% of people selecting it which would result in a Difficulty of 1.0. An item that everyone gets right is not particularly helpful and we might consider dropping it from our tests as it would not seem to discriminate between the more and less competent/knowledgeable. And conversely if no one selected the right choice (i.e. everyone got it wrong) we’d have a difficulty level of 0.0 which would tell us the question is too hard. 

Okay so we get that we need a Difficulty above 0 and below 1 and in our example it is .64 and most often good test items yield Difficulty levels above 0.6 and below 0.9.

So now we can calculate the Participant Mean which is the average score of the test for all of the Participants that selected this choice.  And so in our example all the Participants that selected choice “A” had their final tests scores calculated and when we then calculated the mean we determined that it was 35%. We can calculate this statistic for each choice and in our example we can see that the highest Participant Mean is derived form the people selecting choice C, the correct choice; very reassuring.  


It is also reassuring that Participant Mean has nothing to do with how mean the Participant are!

Now things get a little more interesting. 

We can calculate the Outcome Discrimination which is a number of –1 to +1.  It is a calculation based on the contrast of the upper and lower scoring groups that selected a choice.  A -1 value would indicate that this is an extremely bad items as only people that fail the test selected this choice.  Where as a +1 would provide a perfect Outcome Discrimination indicating that all people that were in the upper group selected this choice.


Outcome Correlation is similar to Outcome Discrimination but provides for a more extensive calculation.  Just as with Outcome Discrimination -1 is bad for the right choice but good for an incorrect choice.  


You always look for Outcome Discrimination and Outcome Correlation to be positive and typically you look for values above +0.3 but your rigor and concern over the statistical properties of a test item will depend upon whether it is being used for low, medium and/or high stakes assessments.

Well I hope that you found that useful and if you want to know more please check out Greg Pope’s blog posting on Psychometrics 101 or buy (and read) Sharon and Bill’s book!


Entry filed under: Assessment, Professional Life. Tags: , , , , , , , , , , .

Criterion-Referenced Test Development 3rd Edition ATP Board of Directors

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed

Add to Technorati Favorites


%d bloggers like this: