To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you'll explore examples of practical tools for data gathering.
Learning Objectives
Data Extraction
- start the course
- describe problems and software tools associated with data gathering
- use curl to gather data from the Web
- use in2csv to convert spreadsheet data to CSV format
- use agate to extract data from spreadsheets
- use agate to extract tabular data from dbf files
- extract data from particular tags in an HTML document
Metadata
- distinguish between metadata and data
- work with metadata in HTTP Headers
- work with Linux log files
- work with metadata in email headers
Remote Data
- perform a secure shell connection to a remote server
- copy remote data using a secure copy
- synchronize data from a remote server
Practice: Curl and HTML
- download an HTML file and explore table data