where P
Þis the probability of the word hypothesis H
given a sequence of segments S
is the a posteriori probability estimated by the neural network to each segment
, and Lis the number of characters at the word
hypothesis H
However, in practice, this is a severe rule of
fusing the character probabilities as it is sufficient for a single character to exhibit a low probability (close to zero) to flaw the word probability estimation [12]. Since we have equal prior word probabilities, an alternative to the product rule is the median rule [12], which computes the average probability as: