In short, a new watchword in education needs to be not only "random assignment" but also "convergence," which is a criterion that will require a lot of scientific knowledge and thought.[6] The young Einstein gave a memorable explanation of the principle of convergence from multiple domains when he was still a patent clerk. He received a report from an eminent experimenter that was inconsistent with his theory that the mass of an electron increases with its velocity by a certain amount. The experimenter’s work had been done very carefully, and Einstein’s friend and mentor H. A. Lorentz was ready to give up the theory in view of the unfavorable data. But young Einstein was aware that experimental setups are subject to uncontrolled variables, and in a published review of the subject had this to say in 1907:
It will be possible to decide whether the foundations of the theory correspond with the facts only if a great variety of observations is at hand . . . In my opinion, both [the alternative theories of Abraham and Bucherer] have rather slight probability, because their fundamental assumptions concerning the mass of moving electrons are not explainable in terms of theoretical systems which embrace a greater complex of phenomena.[7]
The key phrases are "great variety of observations" and "embrace a greater complex of phenomena." Ultimately Einstein was shown to be correct, and the overhasty inferences from rigorous but narrow data gathering were wrong. Einstein understood the critical importance of accepting for the time being only those conceptions that converge independently from the widest complexes of phenomena.
This is a point that Steven Weinberg makes very amusingly.
Using the example of medical research, which is similar to educational research in many respects, he cautions that mere experimental and statistical methods can be highly dubious without the explanatory support of fundamental science.
Medical research deals with problems that are so urgent and difficult that proposals of new cures often must be based on medical statistics without understanding how the cure works, but even if a new cure were suggested by experience with many patients, it would probably be met with skepticism if one could not see how it could possibly be explained reductively, in terms of sciences like biochemistry and cell biology. Suppose that a medical journal carried two articles reporting two different cures for scrofula: one by ingestion of chicken soup and the other by a king’s touch. Even if the statistical evidence presented for these two cures had equal weight, I think the medical community (and everyone else) would have very different reactions to the two articles. Regarding chicken soup I think that most people would keep an open mind, reserving judgment until the cure could be confirmed by independent tests. Chicken soup is a complicated mixture of good things, and who knows what effect its contents might have on the mycobacteria that cause scrofula? On the other hand, whatever statistical evidence were offered to show that a king’s touch helps to cure scrofula, readers would tend to be very skeptical because they would see no way that such a cure could ever be explained reductively... How could it matter to a mycobacterium whether the person touching its host was properly crowned and anointed or the eldest son of the previous monarch?[8]