An independent argument in favor of tanh concerns the scaling of weight layers in feedforward networks. Rigler et al. (1991) have argued that in networks containing multiple weight layers, the changes to the weights in each layer are scaled disproportionately by the gradient term, 8pu. unless measures are taken to adjust for that. They provide a method which maintains the expected value of