This paper discusses our approach to the three subsequent phases of Benson and Clark’s (1982) model—
Construction, Quantitative Evaluation, and Validation—to provide a full disclosure of the research methods and
results used to develop and refine the NGCI iteratively through three versions. Section 2 describes the use of
Concept Inventories in Astro 101 courses and recaps the conclusions drawn from Williamson and Willoughby
(2012) to ground the development of the NGCI. The Construction phase follows in Section 3, including the
refinement of the four concept domains of the NGCI to bound the ideas (both scientifically correct and incorrect)
assessed by the instrument. Here, we also outline our approach to item construction. The Quantitative Evaluation
of pilot testing data of the three versions of the NGCI and the Classical Test Theory statistical analysis is
provided in Section 4 to motivate changes throughout the multi-step development of the NGCI. In Section 5, we
highlight the evolution of nine items to illustrate the iterative process by which multiple-choice questions were
evaluated and modified to ensure the conceptual breadth, scientific accuracy, and item clarity of the NGCI. We
draw on student and expert performance on the NGCI, as well as student interviews and expert review in
Section 6 to argue for the validity of the instrument in measuring Astro 101 students’ understanding of
Newtonian gravity. Section 7 concludes with a summary and future plans.