3. NoSQL categories
NoSQL databases are broadly classified into following 2 categories:
(a) Aggregate Oriented.
(b) Non-aggregate Oriented.
All the Aggregate Oriented NoSQLs collect information from
various nodes in the network cluster. Map-Reduce can be used to
rearrange the data into different aggregate forms. Aggregate oriented
database is useful if the same aggregate is used frequently.
• Aggregate Oriented: The NoSQL under this category are:
• Key/Value Databases such as Amazon S3 (Dynamo), Voldemort,
Scalaris, etc.
• Column family, such as Cassandra used in Banking, financial
industry. Writes are faster than read, so one natural niche is
data analysis. HBase (used by Search engines, any application
where scanning huge, two-dimensional join-less tables are a
requirement).
• Document-based such as MongoDB.
Non-aggregate Oriented NoSQL such as Graph database are a
superset of Aggregate Oriented NoSQL such as Key/Value, Column
family and Document-based. The Non-aggregate oriented NoSQL
uses relations heavily. Non-aggregate databases are more ACID
compliant. In case of applications involving lots of complex Queries
Non-aggregate database such as Neo4J is the best choice
The basic difference between the Aggregate oriented database
and Non-aggregate databases is that Aggregate Oriented database
splits relations.
The common feature is that both the Aggregate Oriented and
Non-aggregate Oriented NoSQL are Schema-Less. Now, the above
mentioned NoSQL and their underlying data storage format are
briefly discussed.
(a) Key/value databases
The key-value stores are commonly known as dictionaries or
hash. In NoSQL key-value stores have been adopted because they
promote easy scalability and flawless growth at rapid speed. The
basic concept is that a globally distributed hash table has keys that
lead to the different database servers scattered all over the world.
Each data item is converted into a key using some unique formula
which is stored in the lookup table or directory. When the data
are needed, the key is converted into the location of the data and
accordingly data are retrieved.
The majority of the NoSQL databases described in this study
are key/value stores at their core habitually providing additional
functionality for access by secondary values. Voldemort [5] is a
key-value storage system used in LinkedIn. Examples of key/value
databases without additional indexing facilities are:
• Berkeley DB.
• MemcacheDB.
• Redis.
• Tokyo Cabinet/Tyrant.
• Riak.
The Pros and Cons of Key/Value Databases are:
Pros:
• Very fast.
• Very scalable.
• Simple model.
• Able to distribute horizontally.
Cons:
• Many data structures (objects), hence cannot be easily modeled
as key–value pairs.
(b) BigTable
‘‘BigTable’’ databases are known as Record-oriented or Tabular
databases consisting of multiple Tables, each containing a set of
addressable rows. Each row consists of a set of values that are considered
columns. In addition to Google’s BigTable database other
examples are:
•roworientation Tables (Microsoft).
• Cassandra (Apache).
• HBase (Apache Hadoop project).
• Hypertable.
• SimpleDB (Amazon).
• Voldemort (LinkedIn, now open source).
(c) Columnar databases
Columnar databases are a hybrid between NoSQL and relational
databases. They provide some row-and-column structure, but do
not have the strict rules of relational databases.
Column-oriented databases store and process data by column
instead of row. Having its origin in analytics and business intelligence,
column-stores can be used to build high-performance
applications. Column oriented [6] stores are seen less puristic,
subsuming data stores that integrate column and roworientation.
This is faster because most of the tables contain lots of
columns which are rarely used simultaneously by queries. Columnar
databases only perform I/O on the blocks corresponding to
columns that are actually being read/updated. In addition to the
smaller I/O overhead, memory is more efficiently utilized. The
column-oriented data store very effective in blocking data based
on each column’s data type such as Date, Text,etc.
(d) Document databases
In document oriented databases, data is stored as documents.
A set of documents is called collections. Collections may contain