Regardless of how much data were collected, the features extracted from a dataset are possibly more important than the sample size of the dataset used, because if the prediction features inputted to a ML model hold no meaningful information it will never perform well. While a ML model could be trained on all the available data collected, models that seek to guide rehabilitation programmes, injury prevention initiatives, movement quality assessments and training load optimisation should be built using only a small selection of features—to aid interpretability. Traditionally, studies in sports science have used a subjective selection of features (e.g., a signal’s maximal value) that can potentially result in important information being discarded. Recently, data-driven approaches were introduced and are being used more frequently (e.g., principal component analysis [PCA]). For an optimal and robust collection of features, sports scientists should combine objective and domainspecific knowledge-based features because even a data-driven feature extraction does not guarantee that all information that is contained within the data is used. For example, Richter et al. (2019) sought to develop a ML model that could differentiate between the operated and non-operated limb of an athlete undergoing rehabilitation following anterior cruciate ligament reconstruction and the limb of a non-injured healthy control athlete using kinematic and kinetic measures (time normalised waveforms) recorded during a variety of exercises. To extract prediction features, a PCA was used, as it captures the variability within the dataset. However, this approach may have resulted in missed information on potential benefit to the model, as variability of the dataset does not necessarily correspond to differentiation/classification ability