NoSQL characteristics
The following characteristics are mostly found in all NoSQLs.
(a) Schema-less: Relational databases follow the strict schema which every row or tuple must follow. Each attribute within these tuples is atomic having prescribed domain which it must follow and sometimes has constraints placed on such as NOT NULL or UNIQUE. The systems that need to deal with data requiring frequent changes are the best fit for a schema-less data store such as NoSQL.
(b) Shared Nothing architecture: Instead of using a common storage pool such as SAN each server uses only own local storage, allowing storage to be accessed at local disk speeds instead of network speeds which also permits capacity to be increased by adding more nodes. Costs are also reduced since the commodity hardware can be used with ‘‘Shared Nothing’’ architecture.
(c) Elasticity: Elasticity of databases means dynamic expandedability. When a new node is added to the mesh, some subset of data is to be replicated in new nodes. Elastic databases emphasizing BASE principles necessitate to give-up traditional paradigm of ACID transactions. However, various implementations can prioritize certain requirements of availability over other requirements such as consistency. The following
NoSQLs also support elasticity in principle:
• Cassandra.
• Dynomite.
• Amazon’s Dynamo.
• Voldemort.
• MapReduce.
(d) Sharding: Instead of considering the record storage as a heap of memory locations, records can be partitioned into shards, which are small enough to be managed by a single server. Shards are replicated as per requirement and split when it gets too big. Applications assist in data sharding by assigning each record a partition ID automatically.
(e) Asynchronous Replication: Compared to synchronous replication technique such ‘‘Mirroring’’ and/or Striping, NoSQLs employ asynchronous replication, allowing writes to complete faster and smoother as they are independent of network traffic. However, the limitation of asynchronous replication is that data is not immediately replicated and could be lost in certain windows. Further locking is generally not available for protecting copies of a specific unit of data.
(f) BASE instead of ACID: The acronym BASE was purposely chosen to contrast to ACID paradigm. NoSQL emphasizes on ‘‘Availability’’ and ‘‘Performance’’. Building a database providing ACID properties is difficult, therefore Consistency and Isolation are often forfeited, resulting in most of the BASE approach application. One of the basic concepts behind BASE is that data consistency is to be taken care by the developer’s problem and should not be handled by the database. In a ‘‘Shared something’’ environment, ACID is wanted for forcing Consistency at the end of the transaction, whereas in a ‘‘Shared Nothing’’ environment, BASE is implemented. The features of ACID and BASE are given in Table 1.
The technical mechanism for ‘‘Commit’’ and ‘‘Transactions processing’’, etc. does not differ much in ACID and BASE, but the usage differs. ACID comes from a paradigm of one database with ‘‘Many users’’ and that transaction on datasets are made only one at the time having the ability to change a value. For ACID deadlocks are a challenge. Some NoSQL solutions such as CouchDB are fully ACID and some can be configured to behave as ACID [3]. But, BASE comes from the paradigm where data is distributed and synchronization of data is not feasible. In BASE the exact values are not utterly necessary. A challenge for BASE is how to update distributed data while the data takes many unforeseen routes.
Table 2 summarizes the discussion. Applications specifically financial transactions cannot cope with underlying features provided by BASE. For example, in an online banking application a user who has transferred a certain amount of money to some account must get the confirmation about the successful transaction instantly. However, if the original account’s balance is untouched due to ‘‘eventual Consistence’’ the user might end up issuing multiple transfers and eventually transferring cash multiple times. Hence online banking applications in general cannot cope with BASE properties, whereas for applications such as Facebook, reading an old value in an instance of time is acceptable, hence can possibly best fit for BASE.
In Service Oriented Architecture (SOA), because of the nature of the comprised distributed service capabilities and the availability of standards, implementing consistency is difficult to achieve across service boundaries. The BASE approach, especially outside service boundaries is more easily implementable. If we look at task services and certainly orchestrated task services, the BASE is the default approach. Offering BASE is easier, if it is known what is the accepted margin-of-error in the business or project principal, identified during the requirements clarification phase of every service delivery effort. Without proper requirements and business proce