most users expect. This leaves a huge gulf between the size of the Web and what we
can handle with current single-computer technology. Note that this problem is
not restricted to a few major web search companies; many more companies want
to analyze the content of the Web instead of making it available for public search.
These companies have the same scala bility problem.
The second factor is simple economics. The incredible popularity of personal
computers has made them very powerful and inexpensive. In contrast, large computers serve a very small market, and therefore have fewer opportunities to develop economies of scale. Over time, this difference in scale has made it difficult
to make a computer that is much more powerful than a personal computer that
is still sold for a reasonable amount of money. Many large information retrieval
systems ran on mainframes in the past, but today’s platform of choice consists of
many inexpensive commodity servers.
Inexpensive servers have a few disadvantages when compared to mainframes.
First, they are more likely to break, and the likelihood of at least one server failure goes up as you add more servers. Second, they are difficult to program. Most
programmers are well trained for single-threaded programming, less well trained
for threaded or multi-process programming, and not well trained at all for cooperative network programming. Many programming tool kits have been developed
to help address this kind of problem. RPC, CORBA, Java RMI, and SOAP have
been developed to allow function calls across machine boundaries. MPI provides
a different abstraction, called message passing, which is popular for many scientific
tasks. None of these techniques are particularly robust against system failures, and
the programming models can be complex. In particular, these systems do not help
distribute data evenly among machines; that is the programmer’s job.
most users expect. This leaves a huge gulf between the size of the Web and what wecan handle with current single-computer technology. Note that this problem isnot restricted to a few major web search companies; many more companies wantto analyze the content of the Web instead of making it available for public search.These companies have the same scala bility problem.The second factor is simple economics. The incredible popularity of personalcomputers has made them very powerful and inexpensive. In contrast, large computers serve a very small market, and therefore have fewer opportunities to develop economies of scale. Over time, this difference in scale has made it difficultto make a computer that is much more powerful than a personal computer thatis still sold for a reasonable amount of money. Many large information retrievalsystems ran on mainframes in the past, but today’s platform of choice consists ofmany inexpensive commodity servers.Inexpensive servers have a few disadvantages when compared to mainframes.First, they are more likely to break, and the likelihood of at least one server failure goes up as you add more servers. Second, they are difficult to program. Mostprogrammers are well trained for single-threaded programming, less well trainedfor threaded or multi-process programming, and not well trained at all for cooperative network programming. Many programming tool kits have been developedto help address this kind of problem. RPC, CORBA, Java RMI, and SOAP havebeen developed to allow function calls across machine boundaries. MPI providesa different abstraction, called message passing, which is popular for many scientifictasks. None of these techniques are particularly robust against system failures, andthe programming models can be complex. In particular, these systems do not helpdistribute data evenly among machines; that is the programmer’s job.
การแปล กรุณารอสักครู่..