How does one decide which sigmoidal function to use? In the literature, selection is
justified on either neurobiological grounds or mathematical grounds (Williams, 1986) (Jordan,
1986), on pragmatic grounds (Hecht-Nielsen, 1990) (Wasserman, 1989), or, most commonly, on
intuitive grounds. Most of the intuition comes from rather vague arguments related to its
squashing capabilities and to the fact that the gain (slope) through the axis can be easily manipu-
lated. It is generally believed that “the exact details of the sigmoid are not critical in a back-
propagation network.’’ (p.19, Caudill, 1990). While this is true for the performance of an
already-trained network, there are important differences that can impact training dramatically,
Therefore, attention should focus on choosing an activation function that exhibits the best pro-
perties for training. Our arguments come from this perspective.