VII. THREATS TO VALIDITY
All projects selected for this study belong to the same organization practicing code reviews using the same tool. While the tool itself may be specific, prior work has shown that most reviewing performed today follows a similar workflow. Code reviews via CodeFlow are similar to the processes based on other popular tools such as Gerrit, ReviewBoard, GitHub pull requests, and Phabricator. Many companies and open source projects that practice review are using tools such as these rather than email. Like CodeFlow, these tools facilitate feedback from reviewers about the change, often allowing reviewers to indicate specific parts of the change [7], [31].
Most of the attributes calculated for this study can be also calculated for code reviews conducted with these other tools. Also, prior study results suggest that there are large similarities between the code review practices of different OSS and commercial projects [7].
We have attempted to mitigate threats to external validity by including projects in this study that represent diverse product domains and platforms. Nonetheless, some biases remain; all projects are large-scale, relatively mature, and come from the same company. We attempted to validate the model training data and the results of the model’s classification in multiple ways, checking consistency with inter-rater reliability, using k-fold cross validation, and comparing classification results with