Tremendous and potentially infinite volumes of data streams are often generated by
real-time surveillance systems, communication networks, Internet traffic, on-line transactions
in the financial market or retail industry, electric power grids, industry production
processes, scientific and engineering experiments, remote sensors, and other
dynamic environments. Unlike traditional data sets, stream data flow in and out of a
computer system continuously and with varying update rates. They are temporally ordered,
fast changing, massive, and potentially infinite. It may be impossible to store an entire
data stream or to scan through it multiple times due to its tremendous volume. Moreover,
stream data tend to be of a rather low level of abstraction, whereas most analysts
are interested in relatively high-level dynamic changes, such as trends and deviations. To
discover knowledge or patterns from data streams, it is necessary to develop single-scan,
on-line, multilevel, multidimensional stream processing and analysis methods.
Such single-scan, on-line data analysis methodology should not be confined to only
stream data. It is also critically important for processing non stream data that are massive.
With data volumes mounting by terabytes or even petabytes, stream data nicely
capture our data processing needs of today: even when the complete set of data is collected
and can be stored in massive data storage devices, single scan (as in data stream
systems) instead of random access (as in database systems) may still be the most realistic
processing mode, because it is often too expensive to scan such a data setmultiple times.
In this section,we introduce several on-line stream data analysis and mining methods.
Section 8.1.1 introduces the basic methodologies for stream data processing and querying.
Multidimensional analysis of stream data, encompassing stream data cubes and
multiple granularities of time, is described in Section 8.1.2. Frequent-pattern mining
and classification are presented in Sections 8.1.3 and 8.1.4, respectively. The clustering
of dynamically evolving data streams is addressed in Section 8.1.5.
Tremendous and potentially infinite volumes of data streams are often generated by
real-time surveillance systems, communication networks, Internet traffic, on-line transactions
in the financial market or retail industry, electric power grids, industry production
processes, scientific and engineering experiments, remote sensors, and other
dynamic environments. Unlike traditional data sets, stream data flow in and out of a
computer system continuously and with varying update rates. They are temporally ordered,
fast changing, massive, and potentially infinite. It may be impossible to store an entire
data stream or to scan through it multiple times due to its tremendous volume. Moreover,
stream data tend to be of a rather low level of abstraction, whereas most analysts
are interested in relatively high-level dynamic changes, such as trends and deviations. To
discover knowledge or patterns from data streams, it is necessary to develop single-scan,
on-line, multilevel, multidimensional stream processing and analysis methods.
Such single-scan, on-line data analysis methodology should not be confined to only
stream data. It is also critically important for processing non stream data that are massive.
With data volumes mounting by terabytes or even petabytes, stream data nicely
capture our data processing needs of today: even when the complete set of data is collected
and can be stored in massive data storage devices, single scan (as in data stream
systems) instead of random access (as in database systems) may still be the most realistic
processing mode, because it is often too expensive to scan such a data setmultiple times.
In this section,we introduce several on-line stream data analysis and mining methods.
Section 8.1.1 introduces the basic methodologies for stream data processing and querying.
Multidimensional analysis of stream data, encompassing stream data cubes and
multiple granularities of time, is described in Section 8.1.2. Frequent-pattern mining
and classification are presented in Sections 8.1.3 and 8.1.4, respectively. The clustering
of dynamically evolving data streams is addressed in Section 8.1.5.
การแปล กรุณารอสักครู่..
