When examining Hadoop availability it's important not to focus solely on the NameNode. There is a tendency since that is the single point of failure for HDFS, and many components in the ecosystem rely on HDFS, but Hadoop availability is a more general larger issue. In this course we are going to examine the availability and how to recover from failures for the NameNode, DataNode, HDFS, and YARN. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
Learning Objectives
Availability of Hadoop
- start the course
- describe how Hadoop leverages fault tolerance
- recall the most common causes for NameNode failure
- recall the uses for the Checkpoint node
- test the availability for the NameNode
- describe the operation of the NameNode during a recovery
- swap to a new NameNode
- recall the most common causes for DataNode failure
- test the availability for the DataNode
- describe the operation of the DataNode during a recovery
- set up the DataNode for replication
High Availability for HDFS
- identify and recover from a missing data block scenario
- describe the functions of Hadoop high availability
- edit the Hadoop configuration files for high availability
- set up a high availability solution for NameNode
- recall the requirements for enabling an automated failover for the NameNode
- create an automated failover for the NameNode
YARN Containers
- recall the most common causes for YARN task failure
- describe the functions of YARN containers
- test YARN container reliability
YARN Jobs
- recall the most common causes of YARN job failure
- test application reliability
High Availability for YARN
- describe the system view of the Resource Manager configurations set for high availability
- set up high availability for the Resource Manager
Practice: Managing Availability
- move the Resource Manager HA to alternate master servers