Once data is transformed into a useable format, the next step is to carry out preliminary data exploration on the data. In this course, you'll explore examples of practical tools and techniques for data exploration.
Learning Objectives
Introduction to Data Exploration
- start the course
- use csvgrep to explore data in CSV data
- use csvstat to explore values in CSV data
- use csvsql to query CSV data like a SQL database
- use gnuplot to quickly plot data on the command line
- use wc to count words, characters, and lines within a text file
- explore a subdirectory tree from the command line
- use natural language processing to count word frequencies in a text document
- take random samples from a list of records
- find the top rows by value and percent in a data set
- find repeated records in a data set
- identify outliers using standard deviation
Practice: Exploring Word Frequencies
- perform a word frequency count on a classic book from Project Gutenberg