Data Silos, Lakes, and Streams: Data Lakes on AWS from Skillsoft

Online, Self-Paced

Traditional data warehousing is transitioning to be more cloud-based and this can be a key area that must be mastered for data science. In this course, you will discover how to build a data lake on the AWS cloud by storing data in S3 buckets and indexing this data using AWS Glue. Explore how to run crawlers to automatically crawl data in S3 to generate metadata tables in Glue.

Learning Objectives

Data Silos, Lakes, and Streams: Data Lakes on AWS

configure a custom role with specific permissions on AWS
create an S3 bucket and upload files
recognize the different operations that can be performed using the AWS Glue console
create metadata tables in Glue using the web console
perform queries on the Glue data catalog using Athena
perform data crawling on S3 to automatically detect schemas
execute queries on data in crawled tables
perform crawling operations with multiple files in the same path
merge data stored in multiple files in the same folder path
merge data when files have the exact same schema
recall the roles and features of the different AWS services used in the data lake architecture