In vivo participant therapists were instructed to submit their final insession video to be rated on the SFT Fidelity Rating Scale, for inter-rater reliability.The first unanticipated problems occurred when therapist compliance with this step was exceedingly poor. Despite bringing videos each week for supervision, few submitted their final work for inter-rating. Intensified methods were created to obtain videos for inter-rating, but many therapists in later cohorts worked for government funded or audited agencies that would not permit session videos to be viewed by anyone other than the supervisor.The second unforeseen event was the discontinuation of the Minuchin Center training program when the third cohort completed training on May 31,2011, ending recruitment and data collection before achieving our goal of recruiting many more therapists. Lastly, from the start of this research project,the instructor-supervisor indicated he would not complete the Fidelity Rating Scale; however, he agreed to assign ordinal ratings to the students based on the criteria set forth in the Fidelity Rating Scale. These three events resulted in insufficient inter-rater Fidelity Rating Scale data. The only fidelity data we had was the Supervisor’s Ordinal Fidelity Rating. Because such ordinal data does not allow the three cohorts to be consolidated into one data set, three small cohorts resulted, along with the statistical limitations of ordinal data and small sample size. These are viewed as imperfections involved in conducting research within an ongoing training program. Despite these limitations,Castonguay et al. (2010) and others argue for the benefits of using practice as a natural laboratory, wherein researchers conduct naturalistic studies in real world settings to obtain findings that may be more generalizable.