Extensive rater training and calibration was not performed before scoring the data. Even so, rater effects are small for the conceptual sophistication construct (z $ ˙0.2) and the validity construct (z $ ˙0.3). The larger rater effects for the specificity construct (z < ˙0.4) are grouped in two narrow clusters, notable in comparison to the relatively flat distribution of rater effects for the other two constructs. This grouping suggests two different scoring interpretations were each applied relatively consistently. This interpretation is supported by rater reports indicating that MR responses, in which sinking is attributed to the density of the object being greater than the density of the medium, were particularly troublesome to score using the specificity outcome space. Some raters consistently placed these responses in the inexact category, as they are similar to responses in which sinking is attributed to the mass of an object being greater than its volume. Such a response is true only when mass is measured with units g and volume is measured with units mL. Stated without units, therefore, such a response is inexact. Other raters consistently placed the density comparison in the exact category, as specifying units in this case is not necessary because one density will always be greater than another regardless of the units used. Reconciling these scoring interpretations would likely decrease the rater effect for the specificity construct. The latter interpretation strikes us as more consistent with the spirit of the specificity construct map.
Although rater effects would likely be smaller as a result of more extensive training and calibration, we always recommend that rater effects be rechecked after any training. Ideally, rater harshness estimates should be included in the measurement model, as was the case here, so that they do not introduce error into the person proficiency estimates.
With the rater effects accounted for by the measurement model, reliability was satisfactory (r # .85) and greater than that for the accuracy estimates (r D .67). The greater reliability of the estimates associated with the three EBRAS constructs as compared to those associated with the accuracy dimension reflects the additional information provided by a polytomous outcome space. That a meaningful polytomous outcome space can be consistently applied to the data reflects the advantages of the multilevel cognitive model of latent proficiency represented by the EBRAS construct maps. This is one of the reasons why we recommend that assessment design always begin with the development of construct maps rather than items
Extensive rater training and calibration was not performed before scoring the data. Even so, rater effects are small for the conceptual sophistication construct (z $ ˙0.2) and the validity construct (z $ ˙0.3). The larger rater effects for the specificity construct (z < ˙0.4) are grouped in two narrow clusters, notable in comparison to the relatively flat distribution of rater effects for the other two constructs. This grouping suggests two different scoring interpretations were each applied relatively consistently. This interpretation is supported by rater reports indicating that MR responses, in which sinking is attributed to the density of the object being greater than the density of the medium, were particularly troublesome to score using the specificity outcome space. Some raters consistently placed these responses in the inexact category, as they are similar to responses in which sinking is attributed to the mass of an object being greater than its volume. Such a response is true only when mass is measured with units g and volume is measured with units mL. Stated without units, therefore, such a response is inexact. Other raters consistently placed the density comparison in the exact category, as specifying units in this case is not necessary because one density will always be greater than another regardless of the units used. Reconciling these scoring interpretations would likely decrease the rater effect for the specificity construct. The latter interpretation strikes us as more consistent with the spirit of the specificity construct map.แม้ว่าการประเมินผลจะมีขนาดเล็กผลครอบคลุมมากกว่าการฝึกอบรมและการสอบเทียบ เราเสมอแนะนำให้ ประเมินผลตรวจสอบหลังการฝึกอบรมใด ๆ การประเมินถ่ายประเมินควรจะรวมในแบบวัด เป็นกรณีนี้ เพื่อให้พวกเขาไม่ปรากฏข้อผิดพลาดในการประเมินวัดระดับบุคคลผลประเมินตามแบบการประเมิน ความน่าเชื่อถือได้เป็นที่พอใจ (r #.85) และที่มากกว่าสำหรับความถูกต้องประมาณการ (r D .67). การประเมินที่เกี่ยวข้องกับโครงสร้าง EBRAS สามเมื่อเทียบกับผู้ที่เกี่ยวข้องกับมิติความแม่นยำความน่าเชื่อถือมากขึ้นสะท้อนให้เห็นถึงข้อมูลที่เพิ่มเติมมาจากช่องว่างผล polytomous ว่า พื้นที่ผลประโยชน์ polytomous สามารถจะสอดคล้องกับข้อมูลสะท้อนให้เห็นถึงข้อดีของรูปแบบหลายระดับความรู้ความเข้าใจความสามารถแฝงที่แสดง โดย EBRAS การสร้างแผนที่ นี่คือเหตุผลทำไมเราขอแนะนำว่า การออกแบบการประเมินเสมอเริ่มต้น ด้วยการพัฒนาโครงสร้างแผนที่ไม่ใช่สินค้าอย่างใดอย่างหนึ่ง
การแปล กรุณารอสักครู่..
