Large Scale Assessment
I am thinking of the possibility of promoting adaptive testing in China, which definitely will be large-scale. In China, there are various tests each year, and where I can start with? ETS seems succeed in holding GRE and TOEFL there, but the registration fee for each test is high. Maybe it is result from the high cost of labor in US. Other than this factor, we need put item exposure, equipment requirement, test validity, and test reliability into consideration in such a big population. It will be a critical task to persuade Chinese government to accept so novel a testing approach. However, considering the influence of GRE and TOEFL, the starting point will be high-educated students (such as graduate or undergraduate) first. Of course, there are many many other things need to be considered.
Reliability of CAT
How can we trust computerized adaptive testing? For a novel thing, there is indeed a long period to be trusted, and CAT is not different. Naturally, we will compare result of CAT with traditional paper based test, and the ideal outcome is that examinees be assigned same or similar scores if they participate these two kinds of tests. From classical true-score theory, we know that true score of each examinee is unkown. So, we need think of some reliable ways to evaluate our work on CAT.
Classical True-Score Theory
Learning from classical theories and discoveries will always benefit us, finding something hidden, ignoring idea, and preventing repeating mistakes.
"Classical true-score theory involves an additive model. An observed test score X is the sum of two components: a stable true score T and a random error score E ..."
Local Independence in Multidimensinonal Adaptive Test
In multidimensional test, although we put more dimensions into consideration, hopping that latent traits can be measured in one exam, there are more dimensions other than these wishing abilites will be brought into already-complex problem space. The most one is local independence - how to maintain this important assumption for not only uni- but muti-dimensional ones?
Item Response Theory
Item Response Theory (IRT) brings a revolution in education and psychological measurement, replacing the traditional classical measurement theory. After years of sound theory work, today, IRT already be widely used in various large-scale test, including known GRE, TOEFL and so on. So, everyone engaging in measurement will have to learn IRT, even it is their focus or not. Normally, even for new idea, the performance need be challenged by IRT, the dominating theory. For IRT itself, improvement and adjustment are welcome too since we know that no best but better.
Useful Resources about Measurement and Evaluation Online
http://www.mhhe.com/socscience/education/edmeas/resources.html
Other than IRT, what we can choose ...
For decades, IRT has been dominating theory for adaptive testing, achieving great success. However, other than make it more perfect, can we try some other approaches? Maybe we can not find new tool as excellent as IRT in a short term, but it still worth our trials since now explorers are always in demand during the progress of human being.