Tuesday, December 28, 2004

Large Scale Assessment

I am thinking of the possibility of promoting adaptive testing in China, which definitely will be large-scale. In China, there are various tests each year, and where I can start with? ETS seems succeed in holding GRE and TOEFL there, but the registration fee for each test is high. Maybe it is result from the high cost of labor in US. Other than this factor, we need put item exposure, equipment requirement, test validity, and test reliability into consideration in such a big population. It will be a critical task to persuade Chinese government to accept so novel a testing approach. However, considering the influence of GRE and TOEFL, the starting point will be high-educated students (such as graduate or undergraduate) first. Of course, there are many many other things need to be considered.

Tuesday, December 21, 2004

Reliability of CAT

How can we trust computerized adaptive testing? For a novel thing, there is indeed a long period to be trusted, and CAT is not different. Naturally, we will compare result of CAT with traditional paper based test, and the ideal outcome is that examinees be assigned same or similar scores if they participate these two kinds of tests. From classical true-score theory, we know that true score of each examinee is unkown. So, we need think of some reliable ways to evaluate our work on CAT.

Classical True-Score Theory

Learning from classical theories and discoveries will always benefit us, finding something hidden, ignoring idea, and preventing repeating mistakes.
"Classical true-score theory involves an additive model. An observed test score X is the sum of two components: a stable true score T and a random error score E ..."

Saturday, December 18, 2004

Local Independence in Multidimensinonal Adaptive Test

In multidimensional test, although we put more dimensions into consideration, hopping that latent traits can be measured in one exam, there are more dimensions other than these wishing abilites will be brought into already-complex problem space. The most one is local independence - how to maintain this important assumption for not only uni- but muti-dimensional ones?

Thursday, December 16, 2004

Item Response Theory

Item Response Theory (IRT) brings a revolution in education and psychological measurement, replacing the traditional classical measurement theory. After years of sound theory work, today, IRT already be widely used in various large-scale test, including known GRE, TOEFL and so on. So, everyone engaging in measurement will have to learn IRT, even it is their focus or not. Normally, even for new idea, the performance need be challenged by IRT, the dominating theory. For IRT itself, improvement and adjustment are welcome too since we know that no best but better.

Wednesday, December 08, 2004

Useful Resources about Measurement and Evaluation Online

http://www.mhhe.com/socscience/education/edmeas/resources.html

Wednesday, December 01, 2004

Other than IRT, what we can choose ...

For decades, IRT has been dominating theory for adaptive testing, achieving great success. However, other than make it more perfect, can we try some other approaches? Maybe we can not find new tool as excellent as IRT in a short term, but it still worth our trials since now explorers are always in demand during the progress of human being.