A data miner pioneer: where’s the data?
Dr Usama Fayyad is a data mining pioneer who began working in the field in 1989. He started in
NASA’s Jet Propulsion Laboratory, compiling data on astronomical phenomena such as volcanoes,
star systems, etc. Then he worked for Microsoft research, and after leaving Microsoft he established
digiMine to deal with the issues of data mining and data warehousing. Dr Fayyad describes the
complexity of activities connected with data mining as follows. If you for example work in a
telecommunication company and want to find records on cell phone frauds, based on today’s
databases you will get no answers to these questions, because the interface interaction has been
designed so as to address problems where the target is known, and the database is commanded to
retrieve a result. However, if we do not have an exact target description, today we are lost in the
database. Therefore, as Dr Fayyad emphasized, it is important to differentiate between the capacity
to store data and capacity to access it efficiently. The big question today is: Where’s the data?
Dr Fayyad says he realized that ‘‘you cannot mine if you can’t have access to the data. And you can’t
have the right data unless you ensure that there’s a successful data warehouse . . . Hence, digiMine
begins from the other end, asking the client what data needs to be mined and how to apply the
algorithms. From there, digiMine sets up the data warehouse and the technology to grab the data
from a variety of formats. The customer installs their software, which they maintain and run from their
data center’’ (Segal, 2002, from http://itmanagement.earthweb.com/06/11/2002/).