• Online, Self-Paced
Course Description

In this course you will learn about performing data analysis using Spark SQL and Hive. It is one in a series of courses that prepares learners for exam 70-775: Perform Data Engineering on Microsoft Azure HDInsight.

Learning Objectives

Data Analysis using Spark SQL

  • start the course
  • describe Jupyter and Apache Zeppelin
  • merge DataFrames using Spark SQL
  • describe Apache Parquet
  • manage interactive Livy sessions

Data Analysis using Hive

  • describe what interactive querying is and how its used with Hive
  • use Ambari Views
  • use HiveOL
  • describe how to parse files such as CSV files with Hive
  • use ORC for caching
  • use Hive tables
  • use Zeppelin to visualize data

Practice: Using Spark Data Analysis

  • use data analysis for Spark SQL

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.