• Online, Self-Paced
Course Description

The Apache Hadoop software library is a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. This course will focus on performance tuning of the Hadoop cluster. We will examine best practices and recommendations for performance tuning of the operating system, memory, HDFS, YARN and MapReduce. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Learning Objectives

Performance Tuning Hadoop Clusters

  • start the course
  • recall the three main functions of service capacity
  • describe different strategies of performance tuning

Performance Tuning Networks

  • list some of the best practices for network tuning
  • install compression

Performance Tuning Servers

  • describe the configuration files and parameters used in performance tuning of the operating system
  • describe the purpose of Java tuning
  • recall some of the rules for tuning the datanode

Performance Tuning Memory

  • describe the configuration files and parameters used in performance tuning of memory for daemons
  • describe the purpose of memory tuning for YARN
  • recall why the Node Manager kills containers
  • performance tune memory for the Hadoop cluster

Performance Tuning HDFS

  • describe the configuration files and parameters used in performance tuning of HDFS
  • describe the sizing and balancing of the HDFS data blocks
  • describe the use of TestDFSIO
  • performance tune HDFS

Performance Tuning YARN

  • describe the configuration files and parameters used in performance tuning of YARN
  • configure Speculative execution
  • describe the configuration files and parameters used in performance tuning of MapReduce
  • tune up MapReduce for performance reasons
  • describe the practice of benchmarking on a Hadoop cluster
  • describe the different tools used for benchmarking a cluster
  • perform a benchmark of a Hadoop cluster

Modeling Applications

  • describe the purpose of application modeling

Practice: Performance Tuning

  • optimize memory and benchmark a Hadoop cluster

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.