Speech recognition is very important part of this system in this phase speech samples are obtained from speaker at real time and stored for preprocessing. For speech recognition we require microphone to receive voice speech signals, Speech acquisition can be easily done by the microphone present in the mobile phone, In the acquisition phase the different M/C is depends upon the its own configuration, hence there is need to store the sample of different users to make system more compatible to any type of voice. To recognize the speech HMM-based automatic recognition was conducted. For continuous phoneme recognition, an 86% phoneme correct was achieved for the normal-hearing. To achieve speech preprocessing sphinx frame work is used this is the best tool found to acquiesce speech signals. Sphinx is design with high flexibility modularity. Recognition or pattern classification is the process of comparing the unknown test pattern with each sound class reference pattern and computing a measure of similarity (distance) between the test pattern and each reference pattern. The digit is recognized using a maximum likelihood estimate, such as the Viterbi decoding algorithm, which implies that the digit whose model has the maximum probability is the spoken digit. Preprocessing, feature vector extraction, and codebook generation are same as in HMM training. The input speech sample is preprocessed and the feature vector is extracted. Then, the index of the nearest codebook vector for each frame is sent to all digit models. The model with the maximum probability is chosen as the recognized digit.