Bootcamp developer and instructor:
Saeed Aghabozorgi, PhD, Chief Data Scientist at Cognitive Class, IBM
Polong Lin, MSc, Data Scientist at Cognitive Class, IBM
Option 1: 4 days- Aug 12 to 15.
Option 2: 4 days- Aug 16 to 19 (repeat of Option 1).
Day 1 Morning: Introduction to Data Science
What is Data Science? Learn about the importance of data, machine learning, and big data. Find out about IBM's free online resource for data science education – Cognitive Class. And get a feel for popular open data science tools through IBM's Cognitive Class Labs platform which includes Jupyter (IPython) Notebooks, RStudio IDE, and Apache Spark.
- Explore definitions of data science, paths to data science, R vs Python, data science tools, skills and technology, definition of cloud, Big Data, etc.
- Learn about data science in a business context
- Discover some business applications and use cases for data science
Day 1 Afternoon: Data Analysis with Python
Learn how to analyze data using Python. This section will take you from the basics of Python to exploring many different types of data. You will learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data, and more!
- Importing and cleaning Data sets
- Intro to Pandas, Numpy and Scipy libraries
- Data frame manipulation
- Summarizing the Data
Day 2 Morning: Statistics for Data Science with Python
Through lecture, labs and an assignment learn basics of Statistics for Data Science. First, the main concepts of statistics are taught through lecture including Central Limit Theorem, Normal Distribution, Descriptive Statistics, then you will practice those in lab.
- Descriptive Statistics Notebook: mean, Median and Standard Deviation
- Histograms and Probability Mass functions Notebook: Calculate and Display data
- Normal Distribution and probability density functions
Day 2 Afternoon: Data Visualization with Python
A picture is worth a thousand words - or should we say data points? In this section, we will go through how to plot the major graphs in Python. Learn how to plot bar graphs, line graphs, histograms, and more. Finally, learn how to create an interactive visualization of data using Plot.ly.
- Intro to matplotlib and Plotly library
- Histograms, Bar graphs, Line graphs and Scatter plots
- Maps (creating maps using latitude, longitude data)
Day 3 Morning: Big Data with Python
You will learn how to work with Big Data using Apache Spark. Spark is a lightweight front-end library that is used for distributed processing when dealing with big data. You will read data from a big dataset, preprocess and apply preprocessing operations.
- Intro to Apache Spark
- Reading data from a big dataset
- Selecting data, filtering, and aggregating big data
Day 3 Afternoon: Machine Learning with Python
How can we get machines to learn from the data on their own? In this part you will learn get an overview of machine learning algorithms. To get hands-on practice with machine learning, you will work with real data sets and practice data mining techniques to predict or classify different datasets. Also, you will learn how to choose the best algorithm for different problems in various domains and industries.
- Overview of Machine Learning
- Which ML algorithm is proper for my problem?
- Classification (Decision trees and KNN)
- Clustering (Hierarchical and k-means)
- Machine learning libraries, e.g
Day 4 Morning: Intro to Deep leaning with TensorFlow
Deep learning is a subset of machine learning that uses neural networks to model high-level abstractions in data, which enables data scientists to create models on complex, unstructured data like images and videos. In this session you will work with specific type of deep learning, called convolutional neural networks, and use TensorFlow library to work with these networks.
- Intro to TensorFlow
- Logistic Regression with TensorFlow
- Convolutional Neural Networks
Day 4 Afternoon: Final exam (optional).
Half day for optional exam to obtain a Verified IBM Badge