about

The Art and Science of Test Maintenance

“It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.
- Mark Twain

One of the great pleasures of working at NICET is the opportunity to meet and learn from some of the best technicians and engineers in their fields.  They come together to build certification programs for their peers – a job they take very seriously.  Knowledge and experience from different regions and companies are carefully combined through NICET’s system of procedures and standards to produce, at a consistently high level of quality, the Content Outlines, test questions, experience requirements, and other elements of a certification process.

After many months of development work, the test has been written, edited, checked, reviewed, double-checked, approved, and published; the SMEs have all gone home.  The test is good – and it’s very tempting to think it will remain so for the next 5, 10, 15 years.

Unfortunately, that tempting vision just isn’t in the cards.  Every work of human hands changes over time.  Systems and structures, from alarm systems to highway bridges, no matter how well constructed, require on-going inspections and intermittent maintenance work.  The same is true for exams, and leads to the same kind of inspection/maintenance cycle.

The “inspection” or monitoring phase of the cycle is carried out mainly by NICET staff, with occasional consultation with SMEs.  There may not be physical devices and structures that can be tested, but there are properties that can be measured and analyzed.  For example:

·        Question difficulty: the proportion of test-takers who answer the question correctly

·        Question discrimination: a measure of whether, and by how much, high-scoring candidates tend to answer correctly more often than low-scorers

·        Distractor analysis: applies the difficulty and discrimination measures to the individual answer options for multiple-choice questions

·        Reliability: A more complex measure that estimates scoring consistency. (If candidates were to test multiple times, how likely is it that they would achieve the same, or nearly the same, score?)

No one of these, by itself, can adequately evaluate the performance of an exam, but they can flag a test (or a test question) as needing further analysis.  That deeper investigation may involve looking at how the statistics support or counter each other; whether there are enough test-takers for the statistics to be reliable; reviewing other evidence such as comments from test-takers or aggregated work history evaluation results; and consulting with SMEs.  If the analysis determines that maintenance is needed, that work will be expedited and the exams republished.  Any impacted candidates will be notified as quickly as possible.

“Maintenance” work could (more often) be of a routine/periodic nature.  Examples of routine maintenance might be updating questions to revised standards, or writing new test questions to replace ones based on technology that is becoming outdated or that have simply been on the exam too long.

Every statistical review and maintenance effort is an opportunity to find some new and unanticipated way to improve a program – and, with help from a great pool of SME talent, to keep on learning.