Fidelity of an Assessment

imageFrom the calculation shown above, you can see that a worthy goal of creating, delivering and reporting on assessments is to minimize the Error of Measurement. However, this competes with another worthy goal; ensuring that assessments are affordable.

imageThe more we can simulate the actual performance environment and the actual knowledge to be recalled, and/or the skill and/or ability to be used the smaller the error of measurement.   So if we were assessing your driving performance we should put you in a car and go driving with you.

Simple enough! Driving without crashing is a worthy goal but hardly the bar that we should set for our driving tests!  We need to ask ourselves what real performance looks like, what are the behaviors that can predict good performance, and what is the appropriate ways to measure these.

So what are the performance characteristics of a good driver?  Good eye sight, ability to control the car, spatial awareness, understanding and obedience to road signs, signaling intent to others, etc.  It might be better to assess some of these attributes before we jump in a car!  So let’s determine the criteria for successful performance and then assess each attribute (sight, road signs, rules of the road, etc) with one form of assessment before we start witnessing someone’s actual performance.

imageWe would start by determining the knowledge, skills and abilities required to perform a task in a real world situation.  Then we would use an appropriate assessment and select less expensive assessments to start with and then progress closer to real world environments.

Now the question is, how do we assess someone’s potential performance within a low cost simulated environment?  The key is to place someone into the context of the performance environment. We can do this by using scenario questions with the stimulus that we can afford/justify:

  • Low cost
    Tell a story about a situation and ask questions related to that story.  “You travelled to Morocco with your friend who rented a car and you came upon a road sign blah, blah, blah. When would you turn left?”
  • Medium cost
    Produce pictures and sounds and let the pictures and sounds tell the story. “Please follow along with the pictures and answer the question when would you turn left?”
  • High Cost
    Show a video that simulates driving. The person could either interact with the simulation or answer multiple questions about the video and or simulation as they progress through the experience.

The closer the simulation is to the real world (sounds, smell, sight, danger, etc), the more accurate the measurement.  When we simulate something to 100% we are in the real world!

In certain situations we want to measure your performance whilst your adrenaline levels are high or when your fight or flight mechanisms are kicking in. To do that we have to raise the fidelity of the simulation without causing harm.

For instance, suppose we want to learn how you would react in a crash.  We can’t go around having you crash things as that would be a danger to yourself and others. However, by using low and high fidelity simulations we could produce pretty accurate predictors on how you would act during a crash. 

imageThe  picture above tries to illustrate this idea of fidelity. The higher the fidelity of the stimulus the more accurate the measurement can be.  However, maybe we don’t need or can’t afford high fidelity. So we have some options:

  • Text stimulus

    Advantages: Easy, inexpensive, and appropriate for many situations.
    Disadvantages: Might inappropriate assess reading skills. Rarely stimulates the body to react with fight or flight mechanisms which might cause a different outcome in real life than during an assessment. 

  • Picture stimulus

    Advantages: Easy, inexpensive, images can convey a real world situations. Can focus on assessing the topic rather than also assessment language skills.
    Disadvantage: Rarely stimulates the body to react with fight or flight mechanisms.

  • Video stimulus

    Advantages: Videos can repeatedly convey real world situations, Can focus on testing on the topic rather than also assessment language skills. Can stimulate more emotional reactions.
    Disadvantages: More expensive to produce.

  • Interactive stimulus (Gaming)




    Advantages: Games can convey real situations. Can focus on testing on the topic rather than also assessment language skills. Can stimulate more emotional reactions.
    Disadvantages: More expensive to produce.

I hope that you find this helpful.

Blooms Taxonomy

imageCreating valid and reliable assessments requires us to distinguish the differences between knowledge, comprehension, and higher levels of thinking.  We need to measure what we want to measure! 

image The Bloom’s Taxonomy is a first class example of how we can think through these distinctions. For a detailed understanding of Bloom’s Taxonomy I’d recommend the book, “A Taxonomy for Learning, Teaching, and Assessing,” by Lorin W. Anderson, David R. Krathwohl, Peter W. Airasian, and Kathleen A. Cruikshank.  With this article I’d like to provide you with a general set of distinctions of the 6 levels and maybe motivate you to learn more.

Knowledge (memory recall)

At the knowledge level we would expect people to remember things such as facts, words, colors, terminology, sequences, methods, etc. Knowledge checks memory recall and nothing more. 

Key words used within a question stem at this level might be: define, describe, identify, label, list, match, name, outline, recall, reproduce, select, state.

In this article I’ll use examples related to driving as this is readily understood by people from diverse cultures. A driving test question at this level could be:


 image What color traffic light requires you to stop”?

    • Yellow
    • Orange
    • Red
    • Green


Comprehension (understanding)

At the comprehension level we would expect people to understand the facts rather than just know them and have the ability to translate abstract ideas into concrete terms.

Words used within a question stem at this level might be: compare, convert, describe, defend, distinguish, estimate, explain, extrapolate, infer, interpret, organize, paraphrase, rewrite, state main idea, summarize, translate.

A driving test question a this level could be:


image image image image

Please check all that apply to stop signs and a traffic lights?

    • You must always stop at a stop sign
    • You must always stop at a traffic light
    • Traffic lights change color
    • Traffic lights have three colored lights
    • Stop signs change color



At the Application level we would expect people to solve problems in new situations by applying knowledge, facts, techniques and rules in different ways.

Words used within a question stem at this level might be: apply, change, compute, construct, demonstrate, discover, execute, manipulate, modify, operate, prepare, produce, relate, show, solve, and use.

High fidelity environments and observations might be used to accumulate evidence at this level but a driving test question could be:



You are driving at 25 miles/hour (50 Km /hour) with a traffic light changes from green to yellow/orange when you are 7 feet (2 meters) away.  In this situation what would you do: 

    • Brake hard and ensure you stop before the light
    • Continue through the light if it safe to do so
    • Accelerate to avoid crossing a red light
    • Refer to the rules of the road manual



At the Analysis level we would expect people to examine the facts, break information into parts by identifying motives or causes, compare and contrast, distinguish between fact and inference, and make inferences and find evidence to support theories.

Words used within a question stem at this level might be: analyze, break down, compare, contrast, diagram, deconstruct, differentiate, distinguish, identifies, investigate, infer, and solve.

At this level we could think of a policeman visiting the site of an accident, seeking and collecting evidence, analyzing possibilities, and organizing evidence to support various theories.  High fidelity environments and observations might be used to accumulate evidence at this level but a test question could be:


image As a policeman you have been called to the scene of an accident. Please look at the pictures and read the transcripts of the interviews to help you understand and analyze the scene.

Please write your report to record the evidence and your analysis. image .




At the Synthesis level we would expect people to compile information together in a different way by combining diverse elements in new patterns and/or proposing alternative solutions.

Words used within a question stem at this level might be: categorize, combine, compile, compose, create, develop, devise, design, explain, formulate, generate, modify, organize, plan, rearrange, reconstruct, reorganize, revise, rewrite, summarize

At this level we could think of a lawyer developing his arguments in a case to defend his client.  At this level information should be abundant and a test question could be:


image Please reference the attached police reports and witness depositions and develop a case to defend your client against a charge of go through a red light at speed.




At the Evaluation level we would expect people to present and defend opinions by making judgments about information, validity of ideas or quality of work based on a set of criteria/rules.

Words used within a question stem at this level might be:  appraise, conclude, criticize, critique, defend, evaluate, interpret, justify, measure, summarize, support, and test.

At this level we could think of a judge who has considered the evidence and listened to the lawyers’ arguments, and again information is abundant. At this level a test question could be:



Please reference the attached police reports, witness depositions and arguments presented by either side and judge the defendant to be guilty or not guilty and explain how you reached your conclusions.



Understanding the distinctions presented by the Blooms Taxonomy we are able to more accurately assess the knowledge, skills and abilities required to yield performance.

With the use of appropriate scenario style questions, potentially supplemented  with documents, pictures, sounds, and videos, it is possible to test at the higher levels of Blooms.

World Values in an Assessment Context

clip_image002Last week a good friend of mine, Eugene Burke of SHL, introduced me to the Values Map, developed by The World Values Survey (WVS).  The map shown to the right came from the PhD work of John Sponney that was shared with Eugene some years ago.

Divided by categories like Self-Enhancement, Individual Dynamics, Group Dynamics, and Consideration for Others, the diagram clusters countries together in cultural patterns to explain educational standards, which directly affect the value systems of individuals.

It was great to have my friend reveal to me the potential impacts of culture in the assessment context.  No perfect models exist for cultural analysis, especially in the context of current migration patterns, but consider the following possibilities:

  • Folks with Anglo-Saxon tendencies might have more of an analytical approach to the individual.  In regard to testing, organizations and individuals may draw more comfort from the results of an assessment of knowledge/skills/aptitude/attitudes that from an interview or knowledge of a person’s family background.
  • Whereas, in a Latin European context, tests are given less credibility and face-to-face interview and confirming that an individual can stand behind their presentations, thesis, etc., holds more credence in the eyes of the beholder.  An individual might undergo a series of face-to-face interviews that allows the interviewee to express themselves, which also gives the interviewer room to establish an opinion based on an emotional connection.

Try judging which is right and which is wrong and enjoy the barrage of comments!  Some people passionately believe in a measured/testing approach and others in a personal connection/interview/q&a approach.  From my point of view value comes from considering the knowledge, skills, abilities and attitudes required and the context of the assessment (couching, decision making, diagnostic, prescriptive, etc) and then using the right balance of one-on-one and testing approaches.

Within the diagram it’s easy to see the intuitive differences that might drive us to use different kinds of assessments. An example that jumped out at me relates to a contrast in the values system. Cheating, it could be concluded, is wrong.  But the minute we take into consideration the cultural context we might discover that cheating may be thought of as solidarity within a culture that promotes collectivism and loyalty.  This adds the nuance that “helping” is not “cheating”.

The clusters represented within the diagram are oversimplified based on location and migration patterns will cause some cultural traits to commingle, which will cause further challenges to conducting multi-lingual assessments of knowledge, skills and abilities that are valid and reliable.  But let’s not give up; let’s reminded ourselves of the differences in order to provide the right stimulus in the right context in order to track the right measure in a timely fashion and then provide the right feedback to the right person at the right time!

Every cloud has a silver lining and every silver lining has a cloud!

