Compensatory vs. Non-Compensatory
Within multidimensional models, different dimension of latent traits can be
compensatory or
non-compensatory. In a compensatory model, the probability of a response depends on a linear combination of the latent traits. A high level on one latent trait can compensate for a low level on one of the other latent traits. In non-compensatory models, the probability of a response depends on the product or some nonadditive functions of the latent traits. A high level on one latent trait cannot compensate for a low level on one of the other latent traits.
It seems hard to say the which one is right since there are chances exist for both of them. For example, in GRE Verbal part, sometimes even we don't know the meaning of one sentence, we can still find the right answer since there is strong logical relation among different parts of one sentence. However, if one Chinese students are asked to answer one calculus equation in English, he will have much trouble if he can't understand the English even maybe he is very strong in calculus. Therefore, whether we choose compensatory or non-compensatory depends on the real condition. Also, the weight of different latent traits need be considered too. If the dimension of high priority is low, it will have critical influence on all final response.
Who Will Guarantee the Quality of Item Bank
With the introduction of CAT, most of our efforts are put on model design, or in other words we only care to let it work. Comparing with traditional test, in which items are designed and used for only for time, CAT reserves more critical requirement on item pool. As we know, perfect item bank will not guarantee perfect performance in CAT; however, today, we are far away from perfect item bank. Besides, what is means GOOD item pool? How to measure the metrics of item quality? How to use them in an optimal way? How to update them? How large is it? If we lost at the very beginnig, how can we persuade people to accept its authority of CAT?
Can I Measurement Software?
Use IRT or related measurement theory, we can measure the latent traits of human beings. Being an object in the nature, like all other objects, software has some similar metrics, like usability, robustness, and so on, and if we could measure these index by using theory used in adaptive testing. Through given a series of items adaptively, our system can determine the hidden level of one examinee; similarly, can we do the same thing on software? At least, we can determine if two software of the same type, like IE and Netscape, have the same strength in specific direction. Steps involved can be listed as follows (based on measurement decision theory):
1) Design item bank;
2) Pilot testing on a series of software, and get the probability of Master and Non-master in the population of a group of software given cutting score; Also, calculate the probability to satisfy or non-satisfy specific item given Master and Non-master state;
3) Use the calibrated items on potential examinee - softwares that we are interested if they are Master or Non-master;
4) Selection rules and stopping rules are the same;
5) After a length of items, we know the softwares' state: Master or Non-master.
Of course, we can add more levels of state, like Excellent, Good, Mature, Just-Pass, Need Enhancement, Forgetable, Fail.