Genetic Algorithm based Rough Set Clustering (GARSC) is a clustering technique that integrates genetic algorithm and rough set theory together. Genetic algorithm is applied to determine optimal or at least satisfactory suboptimal solutions. Rough set theory is incorporated to alleviate the curse of dimensionality problem [32], which leads to unnecessarily large network sizes of many established inference systems. By applying rough set approximations, the original knowledge base is greatly reduced without losing essential information. This wonderful characteristic of rough set theory is extremely helpful to improve the interpretability [33] of an existing inference rule base, i.e. reduce the number of features needed for reasoning, the number of rules employed, and the number of arguments stated in each rule. Therefore, the overall proposed system achieves a high level of interpretability without sacrificing accuracy. The overall GARSC process is illustrated in Fig. 1.
Please note that up to here the inference rules are crisp decision rules and we need to transform them into fuzzy rules by generating Gaussian type fuzzy membership functions based on the clustering results and assigning corresponding linguistic terms. Subsequently, the transformed fuzzy rules are used to evaluate the performance of the current solution. This knowledge transfer process is illustrated in Fig. 2.
This particular process of knowledge transfer from crisp to fuzzy cannot be omitted because the crispness of separation adopted in rough set theory does not tolerant overlapping. Fuzzy membership functions are employed to represent the derived clusters to deal with inexact information and unforeseen circumstances. This kind of knowledge transfer has a great advantage because it naturally prevents the fuzzy membership functions from overlapping or separating too much with adjacent ones, which is another important aspect of interpretability in fuzzy modeling. Furthermore, because clustering is performed in each individual feature, no transformation or normalization is required and more importantly, semantic meanings of the assigned linguistic labels are preserved.