Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. This course focuses on the capacity management of Hadoop clusters. You will be introduced to the concepts of resource management through scheduling. You will learn how to use the Fair Scheduler Tool, and how to plan for scaling. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
Learning Objectives
Capacity Management
- start the course
- compare the differences of availability versus performance
- describe different strategies of resource capacity management
HDFS Capacity
- describe how schedulers perform various resource management
- set quotas for the HDFS file system
YARN Capacity
- recall how to set the maximum and minimum memory allocations per container
- describe how the fair scheduling method allows all applications to get equal amounts of resource time
- describe the primary algorithm and the configuration files for the Fair Scheduler
- describe the default behavior of the Fair Scheduler methods
- monitor the behavior of Fair Share
- describe the policy for single resource fairness
- describe how resources are distributed over the total capacity
- identify different configuration options for single resource fairness
- configure single resource fairness
- describe the minimum share function of the Fair Scheduler
- configure minimum share on the Fair Scheduler
- describe the preemption functions of the Fair Scheduler
- configure preemption for the Fair Scheduler
Service Performance
- describe dominant resource fairness
- write service levels for performance
Practice: Fair Scheduler
- use the Fair Scheduler with multiple users