Executing Dataproc implementations with big data can provide a variety of methods. This course will continue the study of Dataproc implementations with Spark and Hadoop using the cloud shell and introduce BigQuery PySpark REPL package.
Implementation using Dataproc
- start the course
- describe the various Spark and Hadoop processes that can be performed with Dataproc
- recognize the benefits of separating storage and compute services using Cloud Dataproc
- recall the process of monitoring and logging Dataproc jobs
- demonstrate the process of using an SSH tunnel to connect to the master and worker nodes in a cluster
- define the Spark REPL package and how it's used in Linux
Implementation using Cloud Shell
- describe the compute and storage processes and the benefits of their separation and the virtualized distribution of Hadoop
- define BigQuery and its benefits for large-scale analytics
- describe the MapReduce programming model
- demonstrate the process of submitting multiple jobs with Dataproc
Practice: Dataproc Implementations
- recognize the various Dataproc and Cloud Shell job operations and implementations
If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.