comprehensive guide to get you up to speed with the latest developments of practical machine learning with Python and upgrade your understanding of machine learning (ML) algorithms and techniques
▶What You Will Learn
⦁Understand the important concepts in ML and data science
⦁Use Python to explore the world of data mining and analytics
⦁Scale up model training using varied data complexities with Apache Spark
⦁Delve deep into text analysis and NLP using Python libraries such NLTK and Gensim
⦁Select and build an ML model and evaluate and optimize its performance
⦁Implement ML algorithms from scratch in Python, TensorFlow 2, PyTorch, and scikit-learn
▶Key Features
⦁Dive into machine learning algorithms to solve the complex challenges faced by data scientists today
⦁Explore cutting edge content reflecting deep learning and reinforcement learning developments
⦁Use updated Python libraries such as TensorFlow, PyTorch, and scikit-learn to track machine learning projects end-to-end
▶Who This Book Is For
If you're a machine learning enthusiast, data analyst, or data engineer highly passionate about machine learning and want to begin working on machine learning assignments, this book is for you.
Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.
▶What this book covers
⦁ Chapter 1, Getting Started with Machine Learning and Python, will kick off your Python machine learning journey. It will start with what machine learning is, why we need it, and its evolution over the last few decades. It will then discuss typical machine learning tasks and explore several essential techniques of working with data and working with models, in a practical and fun way. You will also set up the software and tools needed for examples and projects in the upcoming chapters.
⦁ Chapter 2, Building a Movie Recommendation Engine with Naïve Bayes, will focus on classification, specifically binary classification and Naïve Bayes. The goal of the chapter is to build a movie recommendation system. You will learn the fundamental concepts of classification, and about Naïve Bayes, a simple yet powerful algorithm. It will also demonstrate how to fine-tune a model, which is an important skill for every data science or machine learning practitioner to learn.
⦁ Chapter 3, Recognizing Faces with Support Vector Machine, will continue the journey of supervised learning and classification. Specifically, it will focus on multiclass classification and support vector machine classifiers. It will discuss how the support vector machine algorithm searches for a decision boundary in order to separate data from different classes. Also, you will implement the algorithm with scikit-learn, and apply it to solve various real-life problems including face recognition.
⦁ Chapter 4, Predicting Online Ad Click-Through with Tree-Based Algorithms, will introduce and explain in depth tree-based algorithms (including decision trees, random forests, and boosted trees) throughout the course of solving the advertising click-through rate problem. You will explore decision trees from the root to the leaves, and work on implementations of tree models from scratch, using scikit-learn and XGBoost. Feature importance, feature selection, and ensemble will be covered alongside.
⦁ Chapter 5, Predicting Online Ad Click-Through with Logistic Regression, will be a continuation of the ad click-through prediction project, with a focus on a very scalable classification model—logistic regression. You will explore how logistic regression works, and how to work with large datasets. The chapter will also cover categorical variable encoding, L1 and L2 regularization, feature selection, online learning, and stochastic gradient descent.
⦁ Chapter 6, Scaling Up Prediction to Terabyte Click Logs, will be about a more scalable solution to massive ad click prediction, utilizing powerful parallel computing tools including Apache Hadoop and Spark. It will cover the essential concepts of Spark such as installation, RDD, and core programming, as well its ML components. You will work with the entire ad click dataset, build classification models, and perform feature engineering and performance evaluation using Spark.
⦁ Chapter 7, Predicting Stock Prices with Regression Algorithms, will focus on several popular regression algorithms, including linear regression, regression tree and regression forest, and support vector regression. It will encourage you to utilize them to tackle a billion (or trillion) dollar problem—stock price prediction. You will practice solving regression problems using scikit-learn and TensorFlow.
⦁ Chapter 8, Predicting Stock Prices with Artificial Neural Networks, will introduce and explain in depth neural network models. It will cover the building blocks of neural networks, and important concepts such as activation functions, feedforward, and backpropagation. You will start by building the simplest neural network and go deeper by adding more layers to it. We will implement neural networks from scratch, use TensorFlow and Keras, and train a neural network to predict stock prices.
⦁ Chapter 9, Mining the 20 Newsgroups Dataset with Text Analysis Techniques, will start the second step of your learning journey—unsupervised learning. It will explore a natural language processing problem—exploring newsgroups data. You will gain hands-on experience in working with text data, especially how to convert words and phrases into machine-readable values and how to clean up words with little meaning. You will also visualize text data using a dimension reduction technique called t-SNE.
⦁ Chapter 10, Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling, will talk about identifying different groups of observations from data in an unsupervised manner. You will cluster the newsgroups data using the K-means algorithm, and detect topics using non-negative matrix factorization and latent Dirichlet allocation. You will be amused by how many interesting themes you are able to mine from the 20 newsgroups dataset!
⦁ Chapter 11, Machine Learning Best Practices, will aim to fully prove your learning and get you ready for real-world projects. It includes 21 best practices to follow throughout the entire machine learning workflow.
⦁ Chapter 12, Categorizing Images of Clothing with Convolutional Neural Networks, will be about using convolutional neural networks (CNNs), a very powerful modern machine learning model, to classify images of clothing. It will cover the building blocks and architecture of CNNs, and their implementation using TensorFlow and Keras. After exploring the data of clothing images, you will develop CNN models to categorize the images into ten classes, and utilize data augmentation techniques to boost the classifier.
⦁ Chapter 13, Making Predictions with Sequences using Recurrent Neural Networks, will start by defining sequential learning, and exploring how recurrent neural networks (RNNs) are well suited for it. You will learn about various types of RNNs and their common applications. You will implement RNNs with TensorFlow, and apply them to solve two interesting sequential learning problems: sentiment analysis on IMDb movie reviews and text auto-generation. Finally, as a bonus section, it will cover the Transformer as a state-of-the-art sequential learning model.
⦁ Chapter 14, Making Decisions in Complex Environments with Reinforcement Learning, will be about learning from experience, and interacting with the environment. After exploring the fundamentals of reinforcement learning, you will explore the FrozenLake environment with a simple dynamic programming algorithm. You will learn about Monte Carlo learning and use it for value approximation and control. You will also develop temporal difference algorithms and use Q-learning to solve the taxi problem.