17.3.2 Nonlinear Principal Components Analysis
The second approach for creating a product catalog map discussed here is based
on NL-PCA. In this approach, a map is created in which not only the products are
plotted, but also the category values of the attributes. These can then be used for
navigation and selection. NL-PCA is a generalization of ordinary principal components
analysis to ordinal (nonlinearly ordered) and categorical attributes. When
only having numerical attributes, NL-PCA simplifies to ordinary PCA and when all
attributes are categorical and a so-called multiple nominal transformation is chosen,
then NL-PCA is identical to homogeneity or multiple correspondence analysis.
In homogeneity analysis, the I ×K data matrix X is modeled by an indicator
matrix Gk for every attribute. Let Lk denote the number of categories of attribute k.
Every category ℓ, ℓ = 1, . . . ,Lk has its own column in the I×Lk matrix Gk, in which
a 1 denotes that the object belongs to this category and a 0 that it does not. Multivalued
categorical attributes are modeled using an I ×2 indicator matrix for every
category. Missing values are incorporated using an I ×I binary diagonal matrix
Mk for all attributes having a value of 1 on the diagonal for nonmissing values
for attribute k and 0 otherwise. Using this data representation, we can define the
following loss function for homogeneity analysis [13, 34] by