Observers underwent a 2-hour training session before rating students’ performances that involved didactic instruction and video ratings of behaviors to calibrate for accuracy. Observer-based rating immediately followed each scenario. Four trained observers rated both scenarios of a session except for 1 session in which only 3 observers rated the second scenario and another session were only 3 raters were present during the entire session. A generalizability study was conducted to determine coefficients for both relative and absolute decisions made by the observers to quantify the degree of error variance and reliability of the scoring. The questionnaire contained 3 open-ended questions about the simulation experience. These responses were transcribed then analyzed using the qualitative methods of Miles and Huberman.40 Responses were first read, reread, listed, coded, and analyzed for themes. Trustworthiness was addressed through attempts at data triangulation, use of participant quotes, and looking for discrepant cases.