Clusters are used to store and analyze large volumes of data in a distributed computer environment. This course outlines the best practices to follow when implementing clusters in Hadoop.
- start the course
- configure an Ubuntu server for ssh and Java for Hadoop
- set up Hadoop on a single node
- set up Hadoop on four nodes
- describe the different cluster configurations, including single-rack deployments, three-rack deployments, and large-scale deployments
- add a new node to an existing Hadoop cluster
- format HDFS and configure common options
- run an example mapreduce job to perform a word count
Practice: Clusters in Hadoop
- start a Hadoop cluster and run a mapreduce job
If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.