A default value for your learning rate is 0.01. Higher than that makes the weights diverge. Your aim is to minimize reconstruction entropy, but that can’t occur if weights can no longer learn features and classify. Each weight represents a neuron’s on-off function, the likelihood that it will be activated. If it gets too large, it becomes meaningless.