• Online, Self-Paced
Course Description

There is a vast toolset that is available for data scientists, with several comprehensive moving parts, especially when it comes to using Python. This course provides the map and dives into data analysis using all the necessary tools with pandas, including machine learning using SciPy operations, working with prediction data, and being introduced to the scikit-learn toolset. Then, the course guides the way to Visualization using Python matplotlib, time series, and many more data engineering operations.

Learning Objectives

Data Analysis Using pandas

  • start the course
  • use pandas to describe the basic and common functionalities of pandas for Data Science
  • use pandas to describe its primary data structures
  • use pandas to describe hierarchical indexing
  • perform basic data query operations on a pandas DataFrame
  • perform aggregation operations on a pandas DataFrame
  • perform basic merge operations with pandas DataFrames

Machine Learning with scikit-learn

  • describe the functionality and use of core packages and sub-packages in the SciPy stack
  • use the scikit-learn library to perform basic data standardization
  • use the scikit-learn library to perform basic data normalization
  • use the scikit-learn library to perform simple linear regression analysis
  • perform supervised learning by using the scikit-learn library to perform optical recognition of hand-written digits

Data Visualization with Python

  • use the Python matplotlib library to plot and display a simple 2D line plot and set its line properties
  • use the Python matplotlib library to create and customize multiple plots in a single figure
  • use the Python matplotlib library to create and customize a box plot
  • use the Python matplotlib library to create and display a heat map
  • use the Python matplotlib library to place legends and annotations on a 2D line plot
  • use pandas to create a scatter plot matrix
  • use the Python matplotlib library to create a 3D plot

Time Series and Forecasting Data

  • create, slice, and resample time series data in Python
  • use pandas to create and manipulate Timedeltas in Python

Data Engineering with Python

  • identify key concepts in Python data cleansing
  • perform data preprocessing and text mining in Python

Working with Databases

  • use pandas to access a MySQL database

Inferential Statistics

  • use the SciPy package to describe the various forms of distribution

Practice: Integrations in Data Science

  • manage other concepts and processes in data science

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.