• Online, Self-Paced
Course Description

Discover how to work with Spark and its in-memory capabilities of data management. How to manage and troubleshoot HDInsight clusters using Ambari and the Azure CLI tool is also covered.

Learning Objectives

Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management

  • specify the essential capabilities of Spark and its essential architectural components
  • list the data structures along with the RDD and lineage concepts that are used in Spark
  • set up Spark clusters using PowerShell and Azure Resource Manager template
  • describe the relationship between Spark SQL and Hive
  • specify the essential concepts of Spark SQL and DataFrame
  • demonstrate the approach of customizing HDInsight clusters using bootstrap
  • install Hadoop applications on Azure HDInsight
  • illustrate the usage of Ambari as a tool in order to manage clusters
  • manage Hadoop clusters in HDInsight using Azure CLI
  • specify the approach of troubleshooting and tuning HDInsight clusters
  • monitor Hadoop clusters in HDInsight to collect metrics for analysis
  • set up Spark clusters and manage the clusters using Ambari GUI

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.