Once data is gathered for data science it is often in an unstructured or raw format. Data must be filtered for content and validity. In this course, you'll explore examples of practical tools and techniques for data filtering.
Introduction to Data Filtering
- start the course
- identify common filtering techniques and tools
- extract date elements from common date formats
- parse content types in HTTP headers
- use csvcut to filter CSV data
- use sed to replace values in a text data stream
- drop duplicate records from data
- extract headers from a jpeg image
- use pdfgrep to extract data from searchable pdf files
- detect invalid or impossible data combinations
- parse robots.txt from a web site to decide what should and shouldn't be crawled nor indexed
Practice: Filtering Dates
- drop records from a CSV file based on date range
If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.