2. Data description and processing
2.1. Original data
With regard to the availability of data, we focused our analysis on the information available from the Transantiago bus
system. Although information exists for the entire operational period from March 2007, we extracted data from a single
week (09/01/2008-09/07/2008).
The database was structured into tables. The bus position table contains geo-coded bus information (latitude, longitude
and time). The GPS device provided bus positions at regular, 30-s intervals when the bus was moving (the buses were moving
for 80% of the observations). If the driver pressed the panic button (which was sometimes pressed accidentally), then the
position was registered every 10 s (this interval was found in approximately 15% of the sample observations). If the bus was
not moving, then a control register was taken every 5 min (5% of the data in our 1-week sample). In total, 6178 different
buses (determined by license plates) were observed in the system. An additional table contains bus assignments and provides
information about each bus route over a certain period. There were 737 different routes in the system, including variations
such as express or short services. The route assignment information was provided by Transantiago. Linking both
tables, we obtained 44,476,637 observations of bus positions that were assigned to known routes.
In this section, we describe the two major procedures of data management employed for the proper computation of commercial
speed. First, we describe the path rectification methodology. Second, we describe the method that allows us to project
the locations of the GPS pulses onto the rectified bus paths.
The data corresponding to the GPS pulses was stored in a database using PostgresSQL8.3 (Stonebraker, 1987). The performance
was optimized by adding proper indices that organized the information and privilege specific queries for service and
time periods. Processing the data for all services took approximately 1 day of computation. The processes to obtain the
speeds and the interface in Google Earth were coded in the language C++.