8. Research Issues
We have described the substantial technical challenges in
developing and deploying decision support systems. While
many commercial products and services exist, there are still
several interesting avenues for research. We will only touch
on a few of these here.
Data cleaning is a problem that is reminiscent of
heterogeneous data integration, a problem that has been
studied for many years. But here the emphasis is on data
inconsistencies instead of schema inconsistencies. Data
cleaning, as we indicated, is also closely related to data
mining, with the objective of suggesting possible
inconsistencies.
The problem of physical design of data warehouses should
rekindle interest in the well-known problems of index
selection, data partitioning and the selection of materialized
views. However, while revisiting these problems, it is
important to recognize the special role played by aggregation.
Decision support systems already provide the field of query
optimization with increasing challenges in the traditional
questions of selectivity estimation and cost-based algorithms
that can exploit transformations without exploding the search
space (there are plenty of transformations, but few reliable
cost estimation techniques and few smart cost-based
algorithms/search strategies to exploit them). Partitioning the
functionality of the query engine between the middleware
(e.g., ROLAP layer) and the back end server is also an
interesting problem.
The management of data warehouses also presents new
challenges. Detecting runaway queries, and managing and
scheduling resources are problems that are important but have
not been well solved. Some work has been done on the
526
logical correctness of incrementally updating materialized
views, but the performance, scalability, and recoverability
properties of these techniques have not been investigated. In
particular, failure and checkpointing issues in load and refresh
in the presence of many indices and materialized views needs
further research. The adaptation and use of workflow
technology might help, but this needs further investigation.
Some of these areas are being pursued by the research
community33 34, but others have received only cursory
attention, particularly in relationship to data warehousing.
Acknowledgement
We thank Goetz Graefe for his comments on the draft.
8. Research IssuesWe have described the substantial technical challenges indeveloping and deploying decision support systems. Whilemany commercial products and services exist, there are stillseveral interesting avenues for research. We will only touchon a few of these here.Data cleaning is a problem that is reminiscent ofheterogeneous data integration, a problem that has beenstudied for many years. But here the emphasis is on datainconsistencies instead of schema inconsistencies. Datacleaning, as we indicated, is also closely related to datamining, with the objective of suggesting possibleinconsistencies.The problem of physical design of data warehouses shouldrekindle interest in the well-known problems of indexselection, data partitioning and the selection of materializedviews. However, while revisiting these problems, it isimportant to recognize the special role played by aggregation.Decision support systems already provide the field of queryoptimization with increasing challenges in the traditionalquestions of selectivity estimation and cost-based algorithmsthat can exploit transformations without exploding the searchspace (there are plenty of transformations, but few reliablecost estimation techniques and few smart cost-basedalgorithms/search strategies to exploit them). Partitioning thefunctionality of the query engine between the middleware(e.g., ROLAP layer) and the back end server is also aninteresting problem.The management of data warehouses also presents newchallenges. Detecting runaway queries, and managing andscheduling resources are problems that are important but havenot been well solved. Some work has been done on the526logical correctness of incrementally updating materializedviews, but the performance, scalability, and recoverabilityproperties of these techniques have not been investigated. Inparticular, failure and checkpointing issues in load and refreshin the presence of many indices and materialized views needsfurther research. The adaptation and use of workflowtechnology might help, but this needs further investigation.Some of these areas are being pursued by the researchcommunity33 34, but others have received only cursoryattention, particularly in relationship to data warehousing.AcknowledgementWe thank Goetz Graefe for his comments on the draft.
การแปล กรุณารอสักครู่..
