that the different train stations do exhibit different time-series
patterns. For example, while there are some train stations
showing a morning and evening peak in its weekday
time-series data plot, the same travel pattern cannot be
observed in other stations. Thus, clustering and further
analysis will need to be done to identify and classify these
different travel patterns among the train stations.
C. Time-Series Data Cluster Analysis
The TSS node performs the clustering and similarity
analysis on the train station time-series data.
Fig. 8 shows the dendrogram of the time-series data plots
generated in the Result panel of the TSS node. The default
number of clusters for TSS node similarity analysis is 5.
However, upon examining the 5 clusters, the results were not
satisfactory as there were still different distinct travel patterns
within each cluster that could be further refined and classified.
As such, a trial and error process was initiated to explore the
optimal number of clusters need to be generated such as each
cluster exhibit unique and interesting passenger travel
patterns.
After much trial and error on the hierarchical clustering, 11
different clusters (labeled A – k) were identified to be the
optimal number of clusters for our analysis. Each of the
clusters has exhibit unique and interesting passenger travel
patterns based on their time-series data plots. Fig. 9 shows the
one-week time-series plot for the 11 clusters.
1) Cluster A – strong morning peak/ moderate evening
peak
The time-series data plots in cluster A have displayed a
strong morning peak and a relatively weaker evening peak on
weekdays, suggesting that the train stations in cluster A were
experiencing high passenger volume entering the stations in
the morning and relatively lesser passenger volume in the
evening. However, the morning and evening peak patterns
were not observed on weekends, where the stations received
relatively constant passenger volume throughout the day.
Examining into the composition of cluster A, we found that it
is made up of train stations situated in residential areas. This
could give us a preliminary explanation for the weekday
morning peak where the passengers living in residential areas
were traveling to work on weekday morning. As for the
relatively lower weekday evening peak, a possible
explanation could be that the passengers, whom had travelled
to the schools or small offices located in the residential areas,
were returning home from work.
2) Cluster B – strong morning peak
The time-series data plots in cluster B have displayed a
strong morning peak on weekdays. However, the morning
peak pattern was not observed on weekends. Examining into
the composition of cluster B, we found that it is made up of
LRT stations situated in residential areas. A possible
preliminary explanation for the weekday morning peak could
be the passengers living in residential areas were traveling to
work on weekday morning. Another interesting observation is
that the morning passenger volume of cluster B was lower
than the morning passenger volume of cluster A. This might
be due to the limited capacity of LRT as it has smaller
carriages compared to MRT.