be modified or removed except the new data can be added to
the file by reopening the file again. HDFS implements a singlewriter,
multiple-reader model. Every time a node opens a file, it
is granted a lease for the file, no other client can write to that file.
A hit to the NameNode permits the lease to be extended. Once the
lease expires and the file is closed, the changes are available to the
readers.
In Hadoop DFS Queues are allocated a fraction of the capacity
of the grid in the sense that a certain capacity of resources will
be at their disposal. All applications submitted in the queue will
have access to the capacity allocated to the queue. Administrators
can configure soft limits and optional hard limits on the capacity
allocated to each queue.
(6) Haystack: Even when a needle (a photo) in a Haystack is
stored in all physical volumes of a logical volume, all updates go
through the same logical volume and are applied to all the replicas
of a photo. A store machine receives requests to create, modify and
delete a photo. These operations are handled by the same store
machine.
5. Discussion
This section provides a view on what type of NoSQL is suitable
for a particular scenario.
(1) Web integration: The attractive features of a database system
for present scenario are ease of integration with web and mobile
applications. APIs are to be developed in line with the database
system, so that it is easy to create web pages or mobile applications
that interact with it. CouchDB is a popular database for integrating
the web and mobile applications [19].
(2) Usage of SSD: [11] using a solid-state disk (SSD) for logging
as an SSD can provide durable writes with very low latencies.
Hadoop uses SSD for the response time critical data of relatively
small size [20].
(3) Single master: Google embraced the concept of a single
master in their distributed system as a valuable tool to simplify
the design and make it more straightforward. Currently GFS uses
one master per cell [21], with multiple cells per data center. Even
though Google has worked around the issue this case underscores
the importance of scalability and the limitation of assigning different
levels of responsibility in any distributed system that is
expected to scale. The biggest weakness in this system that, the
Master can be the single point of failure. However, this can be overcome
by replicating the master state. The ‘‘Shadow Masters’’ can
provide read-only access in the absence of primary master.
(4) Overcompensated systems: CAP theorem explores tradeoffs
between Consistency, Availability, and Partition tolerance. CAP
theorem only discusses about the limitations in the face of certain
types of failures and does not constrain any system capabilities
during normal operation. CAP allows the system to guarantee
all of the ACID alongside high availability when there are no partitions
[22].
(5) Consistency versus Latency: Studies indicate that latency is a
critical factor in online interactions. Increase as small as 100 ms can
dramatically reduce the probability that customers will continue to
interact or return [23] to a web portal. In case of e-Commerce application
this is directly related to the profits. Latency can be achieved
without compensating the CAP theorem and hence it is expected
that database management systems are increasingly going to pay
more attention to decreasing the latency.
As a matter of fact, latency tends to come at the cost of consistency.
Considering several replicas in the system, a client could
immediately read the first replica which it can access without
checking consistency. The second option is that the client could opt