Executing Dataproc implementations with big data can provide a variety of methods. This course will continue the study of Dataproc implementations with Spark and Hadoop using the cloud shell and introduce BigQuery PySpark REPL package.
Implementation using Dataproc
start the course
describe the various Spark and Hadoop processes that can be performed with Dataproc
recognize the benefits of separating storage and compute services using Cloud Dataproc
recall the process of monitoring and logging Dataproc jobs
demonstrate the process of using an SSH tunnel to connect to the master and worker nodes in a cluster
define the Spark REPL package and how it's used in Linux
Implementation using Cloud Shell
describe the compute and storage processes and the benefits of their separation and the virtualized distribution of Hadoop
define BigQuery and its benefits for large-scale analytics
describe the MapReduce programming model
demonstrate the process of submitting multiple jobs with Dataproc
Practice: Dataproc Implementations
recognize the various Dataproc and Cloud Shell job operations and implementations
The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.