• Online, Self-Paced
Course Description

Discover how to implement data lakes for real-time data management. Explore data ingestion, data processing, and data life-cycle management using AWS and other open-source ecosystem products.

Learning Objectives

Data Lake: Architectures & Data Management Principles

  • implement Lambda and Kappa architectures to manage real-time big data
  • identify the benefits of adopting Zaloni data lake reference architecture
  • describe data ingestion approaches and compare Avro and Parquet file format benefits
  • demonstrate how to ingest data using Sqoop
  • describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
  • recognize how to derive value from data lakes and describe the benefits of critical roles
  • describe the steps involved in the data life cycle and the significance of archival policies
  • implement an archival policy to transition between S3 and Glacier, depending on adopted policies
  • ingest data using Sqoop and implement an archival policy to transition from S3 to adopted policies

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.