• Online, Self-Paced
Course Description

To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you'll explore examples of practical tools for data gathering.

Learning Objectives

Data Extraction

  • start the course
  • describe problems and software tools associated with data gathering
  • use curl to gather data from the Web
  • use in2csv to convert spreadsheet data to CSV format
  • use agate to extract data from spreadsheets
  • use agate to extract tabular data from dbf files
  • extract data from particular tags in an HTML document

Metadata

  • distinguish between metadata and data
  • work with metadata in HTTP Headers
  • work with Linux log files
  • work with metadata in email headers

Remote Data

  • perform a secure shell connection to a remote server
  • copy remote data using a secure copy
  • synchronize data from a remote server

Practice: Curl and HTML

  • download an HTML file and explore table data

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.