Abstract
The advent of the cloud computing paradigm has given rise to many innovative and novel proposals for
managing large-scale, fault-tolerant and highly available data management systems. This paper proposes
a taxonomy of large scale partitioned replicated transactional databases with the goal of providing
a principled understanding of the growing space of scalable and highly available database systems.
The taxonomy is based on the relationship between transaction management and replica management.
We illustrate specific instances of the taxonomy using several recent partitioned replicated database
systems.
1 Introduction
The advent of the cloud computing paradigm has given rise to many innovative and novel proposals for managing
large-scale, fault-tolerant and highly available processing and data management systems. Cloud computing is
premised on the availability of large data centers with thousands of processing devices and servers, which are
connected by a network forming a large distributed system. To meet increasing demands for data storage and
processing power, cloud services are scaled out across increasing numbers of servers. To ensure high availability
in the face of server and network failures, cloud data management systems typically replicate data across servers.
Furthermore, the need for fault-tolerance in the face of catastrophic failures has led to replication of data and
services across data centers. Such geo-replication holds the promise of ensuring continuous global access.
This paper considers the problem of providing scalability and high availability for transactional database
systems. These are database systems that support the on-going operations of many organizations and services,
tracking sales, accounts, inventories, users, and a broad spectrum of similar entities and activities. A key feature
of these systems is support for transactional access to the underlying database. Transactions simplify the
development and operation of complex applications by hiding the effects of concurrency and failures.
To provide scalability and high availability, databases are typically partitioned, replicated, or both. Partitioning
splits a database across multiple servers, allowing the database system to scale out by distributing load across