SEHadoop Runtime Model
Some processes in Hadoop contain critical information. A
Name Node manages the file system name-space and has the
keys to generate Block Tokens and Delegation Tokens. An
attacker can fetch all keys from its memory and access data
from all active Data Nodes. A Resource Manager distributes all
Delegation Tokens to proper Node Managers and Containers.
These Delegation Tokens can be used by the hacker to access
data of the Delegation Tokens’ owners. An Application Master
distributes the Delegation Token of one job to all the job’s map
and reduce processes. Once the keys or Delegation Tokens
have been intercepted, an attacker can access a broad range
of data in HDFS. A user uses a Job Client as interface to
access HDFS and YARN, the Job Client contains sensitive
information (e.g. Kerberos’ ticket and Delegation Tokens) and
is able to access all the user’s data. A Node Manager which
manages Application Masters is responsible for setting up
proper configuration, booting up the Application Master and
transferring the Delegation Token to the Application Master.
An attacker can use it to intercept all Delegation Tokens sent to
an Application Master. Hadoop uses Kerberos to conduct for
most of authentication operations. Compromising of Kerberos,
an attacker can impersonate any users in Hadoop. Therefore,
a Name Node, a Resource Manager, Application Masters,
Job Clients, Kerberos, and Node Managers which manages
Application Masters should run in a secure zone. The rest of
processes of SEHadoop, such as Data Nodes, Node Managers
and Containers, can run in a public cloud.