Gaussian processes are typically used for regression where it is assumed that the underlying function is generated by one infinite-dimensional Gaussian distribution (i.e. we assume
a Gaussian prior distribution). In Gaussian process regression (GPR) we further assume
that output data are generated by additive Gaussian noise, i.e. we assume a Gaussian likelihood
model. GPR can be generalized by using likelihood models from the exponential
family of distributions which is useful for classification and the prediction of lifetimes or
counts. The support vector machine (SVM) is a variant in which the likelihood model is
not derived from the exponential family of distributions but rather uses functions with a
discontinuous first derivative. In this paper we introduce another generalization of GPR
in form of the mixture of Gaussian processes (MGP) model which is a variant of the well
known mixture of experts (ME) model of Jacobs et al. (1991). The MGP model allows
Gaussian processes to model general conditional probability densities. An advantage of
the MGP model is that it is fast to train, if compared to the neural network ME model.
Even more interesting, the MGP model is one possible approach of addressing the problem
of input-dependent bandwidth requirements in GPR. Input-dependent bandwidth is useful
if either the complexity of the map is input dependent -requiring a higher bandwidth in
regions of high complexity- or if the input data distribution is input dependent. In the
latter case, one would prefer Gaussian processes with a higher bandwidth in regions with
many data points and a lower bandwidth in regions with lower data density. If GPR models
with different bandwidths are used, the MGP approach allows the system to self-organize
by locally selecting the GPR model with the appropriate optimal bandwidth.