3.3. Data analysis framework
The goal of study is to understand whether the chosen station
is the nearest station choice or not (1, or 0). In this sense, the
logistic regression model works as a classifier for understanding
the probability of choosing the nearest station among the chosen
stations. The logistic models estimate the conditional distribu-
tion of the response Y, given the input variables X, Pr (Y = 1 |X = x).
which is a binary output from the input variables, such as train
users' and their trips’ characteristics. The parameters of x, which is
best
fit to the data, was solved using Maximum likelihood
estimation. It can easily modify log p, which has an unbounded
range, using logistic transformation, logP=1  p, which is the
natural logarithm of odds that train users choose the nearest train
station. Therefore, we can transform the function of x into linear
function. Actually, logistic regression is linear interpolation for the
log-odds (Faraway, 2005). Fig. 3 summarises the data analysis
procedure. If the chosen station is the nearest station, the
dependent variable is one, otherwise it is zero. The independent
variables (Xn in Eq. (1)) are the characteristics of chosen stations,