• Online, Self-Paced
Course Description

Apache Beam, Cloud Dataflow, and Cloud Dataprep can be used to create data pipelines. In this course, you will learn how areas of Beam, Apache Beam SDK, Cloud Dataflow, and Cloud Dataprep assist in pipeline management.

Learning Objectives

Expressing Data Pipelines

  • start the course
  • define Apache Beam concepts and SDKs
  • describe the Python SDK and its connection with data pipelines
  • describe the Java SDK and its connection with data pipelines
  • initialize Cloud Dataprep
  • demonstrate how to ingest data into a pipeline
  • create recipes in a Cloud Dataprep pipeline
  • work with the import/export process and demonstrate how to run Dataflow jobs in Cloud Dataprep

Big Data Processing

  • describe MapReduce and the benefits of Cloud Dataflow over MapReduce
  • outline serverless architecture and some of the GCP products supporting data analytics

Practice: Create and Manage Pipelines

  • describe the use of Apache Beam, Cloud Dataflow, and Cloud Dataprep in GCP to create and manage pipelines

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.