• Classroom
  • Online, Instructor-Led
Course Description

This interactive course will teach security professionals how to use data science techniques to quickly manipulate and analyze network and security data and ultimately uncover valuable insights from this data. The course will cover the entire data science process from data preparation, feature engineering and selection, exploratory data analysis, data visualization, machine learning, model evaluation and optimization and finally, implementing at scale—all with a focus on security related problems.

Participants will learn how to read in data in a variety of common formats then write scripts to analyze and visualize that data. A non-exhaustive list of what will be covered include:

  • Writing scripts to efficiently read and manipulate CSV, XML, and JSON files
  • Quickly and efficiently parsing executables, log files, pcap and extracting * artifacts from them
  • Making API calls to merge datasets
  • Use the Pandas library to quickly manipulate tabular data
  • Effectively visualizing data using Python
  • Preprocessing raw security data for machine learning and feature engineering
  • Building, applying and evaluating machine learning algorithms to identify potential threats
  • Automating the process of tuning and optimizing machine learning models
  • Hunting anomalous indicators of compromise and reducing false positives
  • Use supervised learning algorithms such as Random Forests, Naive Bayes, K-Nearest Neighbors (K-NN) and Support Vector Machines (SVM) to classify malicious URLs and identify SQL Injection
  • Apply unsupervised learning algorithms such as K-Means Clustering to detect anomalous behavior

Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks and demonstrate how to apply these techniques to extremely large datasets.

Learning Objectives

By the end of the course students will be able to:

  • Prepare security data for machine learning using the latest techniques
  • Understand the machine learning process
  • Extract features from security data sets
  • Apply a machine learning technique to solving a security problem
  • Using python, construct a classifier
  • Evaluate and assess the performance of a model
  • Create effective visualizations of security data

Framework Connections

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.