Data Science for Software Engineers

This is the brand new version of the hugely popular “Data Science for Developers” training. After two years of teaching thousands of Engineers around the world, this training has been rebuilt from the ground up to squeeze in more information, have bigger theories and provide better positioning.

In this beginner-level course, we will start the day by establishing what Data Science is and how it is used by companies large and small. You will learn about how to develop a Data Science project and how it differs from “normal” Software Engineering. Note that I use the word Data Science to encompass Machine Learning (ML), Exploratory Data Analysis (EDA), Data Mining, Analytics, Deep Learning, Artificial Intelligence (AI), etc. etc.

Next we will cover the “three key phases of Data Science”: data cleaning, modelling and evaluation. With these fundamentals, along with the extensive practical worksheets, you will be able to undertake and succeed in a simple Data Science project.

In intermediate course on the second day, we will delve into the most important topics in Data Science. The aim is to provide sufficient breadth to give you the appreciation so you can pick and choose to suit your specific problem.

The content matches the tasks and topics that production Engineers face on a day-to-day basis. Indeed, surveys suggest that more than half of an Engineer’s time is spent finding, collecting, organising and cleaning data. Therefore, we spend a significant amount of time learning how to handle and understand data.

Another goal of the intermediate course is to give a broad understanding of as many models as possible in the time available. If you are aware of the major categories, types and instances of models, then you are better positioned to be able to choose the optimal model for the problem.

This training is unique, because nowhere else do you see Data Science laid bare. The materials emphasise the common themes between algorithms, which helps Data Science “click”. Mathematics is avoided as much as practical to instead provide an intuitive understanding.

Who will benefit:

Engineers needing an introduction to Data Science.
People that want to understand the tools and technologies behind the hype.
Beginner Data Scientists wanting end-to-end practical experience and industry insight.

Prerequisites:

Some Python experience would be beneficial
Secondary School mathematics
A charged laptop with a browser that can connect to the internet

Topics:
* = Time permitting

Day 1: Introduction
- Applications
- Disciplines
- Lifecycle
Technical Overview
- Techniques
- Technologies
- Decisions
Phase 1: Introduction to Working With Data
- Visualising data
- Scaling data
- Dealing with corrupted data
Phase 2: Introduction to Modelling
- Classification
- Regression
- Clustering
Phase 3: Introduction to Evaluation
- Numerical evaluation
- Visual evaluation
Many in-depth practical examples demonstrating the day’s concepts

Day 2: Introduction
Probability
- Evidence
- Probabilities
- Probability distributions
- Summary statistics
Generalisation and Overfitting *
In-depth Data Cleaning
- Visualisation 2
- Data availability and consistency
- Types of data
- Corrupted data
- Transforming data
- Scaling data 2
- Feature engineering (derived data)
- Feature selection
- Time series data
- Related topics
In depth model evaluation *
- Technical numerical evaluation *
- Business numerical evaluation *
- Technical visual evaluation and analysis *
- Business visual evaluation *
Dimensionality reduction
- PCA/SDA/LDA/QDA
- Manifold learning
Overview of models
- Classification
- Regression
- Clustering
Grand challenge *

Philip Winder

Helping businesses build data products

Dr. Philip Winder is a multidisciplinary Engineer who creates data-driven software products. His work incorporates Data Science, Cloud-Native and traditional software development using a range of languages and tools.

Phil is the CEO of Winder Research, a Data Science consultancy in the UK, which operates throughout Europe delivering training, development and consultancy services. He has Ph.D. and a Masters degree in Electronics from the University of Hull, UK.

Phil regularly speaks on Cloud-Native Data Science issues at conferences around the world. Aside from public and private training, Phil is also a certified trainer for O’Reilly and Pearson on the Safari platform. He is currently writing a book about Reinforcement Learning for O’Reilly Media.

His work can be seen on https://WinderResearch.com.