Instead, a study is conducted to determine what score best differentiates the classifications of examinees, such as competent vs. Such studies require quite an amount of resources, involving a number of professionals, in particular with psychometric background. Standard-setting studies are for that reason impractical for regular class room situations, yet in every layer of education, standard setting is performed and multiple methods exist. Standard-setting studies are typically performed using focus groups of 5-15 subject matter methods of scoring essay test that represent key stakeholders for the test.
For example, in setting cut scores for educational testing, experts might be instructors familiar with the capabilities of the student population for the test. Standard-setting studies fall into two categories, item-centered and person-centered. Bookmark methods, while examples of person-centered methods include the Borderline Survey and Contrasting Groups approaches. This method requires the assembly of a group of subject matter experts, who are asked to evaluate each item and estimate the proportion of minimally competent examinees that would correctly answer the item.
The ratings are averaged across raters for each item and then summed to obtain a panel-recommended raw cutscore. This cutscore then represents the score which the panel estimates a minimally competent candidate would get. Calibration with other, more objective, sources of data is preferable. Several variants of the method exist.
Angoff method and allowed to take the test with the performance levels in mind. This method is generally used with multiple-choice questions. SMEs make decisions on a question-by-question basis regarding which of the question distracters they feel borderline participants would be able to eliminate as incorrect. This method is generally used with multiple-choice questions only.
For example, for a response probability of . Rather than the items that distinguish competent candidates, person-centered studies evaluate the examinees themselves. While this might seem more appropriate, it is often more difficult because examinees are not a captive population, as is a list of items. The testing organization could then analyze and evaluate the relationship between the test scores and important statistics, such as skills, education, and experience.
The cutscore could be set as the score that best differentiates between those examinees characterized as “passing” and those as “failing. A description is prepared for each performance category. The test is administered to these borderline groups and the median test score is used as the cut score. SMEs are asked to categorize the participants in their classes according to the performance category descriptions.
The test is administered to all of the categorized participants and the test score distributions for each of the categorized groups are compared. Where the distributions of the contrasting groups intersect is where the cut score would be located. Absolute grading standards for objective tests. So much has changed: how the setting of cutscores has evolved since the 1980s. Mahwah, NJ: Lawrence Erlbaum Associates.