Name: Python Feature Engineering Cookbook
Price: 21000 KRW
Availability: OnlineOnly
Author: Soledad Galli

Python Feature Engineering Cookbook 상세페이지

출간 정보

2020.01.22 전자책 출간

듣기 기능

TTS(듣기) 미지원

파일 정보

PDF
364 쪽
8.5MB

지원 환경

앱
웹
PC뷰어
PAPER

ISBN

9781789807820

UCI

Python Feature Engineering Cookbook

작품 정보

Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries

▶Book Description
Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.

Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you'll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You'll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains.

By the end of this book, you'll have discovered tips and practical solutions to all of your feature engineering problems.

▶What You Will Learn
⦁ Simplify your feature engineering pipelines with powerful Python packages
⦁ Get to grips with imputing missing values
⦁ Encode categorical variables with a wide set of techniques
⦁ Extract insights from text quickly and effortlessly
⦁ Develop features from transactional data and time series data
⦁ Derive new features by combining existing variables
⦁ Understand how to transform, discretize, and scale your variables
⦁ Create informative variables from date and time

▶Key Features
⦁ Discover solutions for feature generation, feature extraction, and feature selection
⦁ Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets
⦁ Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries

▶Who This Book Is For
This book is for machine learning professionals, AI engineers, data scientists, and NLP and reinforcement learning engineers who want to optimize and enrich their machine learning models with the best features. Knowledge of machine learning and Python coding will assist you with understanding the concepts covered in this book.

▶What this book covers
⦁ Chapter 1, Foreseeing Variable Problems in Building ML Models, covers how to identify the different problems that variables may present and that challenge machine learning algorithm performance. We'll learn how to identify missing data in variables, quantify the cardinality of the variable, and much more besides.

⦁ Chapter 2, Imputing Missing Data, explains how to engineer variables that show missing information for some observations. In a typical dataset, variables will display values for certain observations, while values will be missing for other observations. We'll introduce various techniques to fill those missing values with some additional values, and the code to execute the techniques.

⦁ Chapter 3, Encoding Categorical Variables, introduces various classical and widely used techniques to transform categorical variables into numerical variables and also demonstrates a technique for reducing the dimension of highly cardinal variables as well as how to tackle infrequent values. This chapter also includes more complex techniques for encoding categorical variables, as described and used in the 2009 KDD competition.

⦁ Chapter 4, Transforming Numerical Variables, uses various recipes to transform numerical variables, typically non-Gaussian, into variables that follow a more Gaussian-like distribution by applying multiple mathematical functions.

⦁ Chapter 5, Performing Variable Discretization, covers how to create bins and distribute the values of the variables across them. The aim of this technique is to improve the spread of values across a range. It also includes well established and frequently used techniques like equal width and equal frequency discretization and more complex processes like discretization with decision trees and many more.

⦁ Chapter 6, Working with Outliers, teaches a few mainstream techniques to remove outliers from the variables in the dataset. We'll also learn how to cap outliers at a given arbitrary minimum/maximum value.

⦁ Chapter 7, Deriving Features from Dates and Time Variables, describes how to create features from dates and time variables. Date variables can't be used as such to build machine learning models for multiple reasons. We'll learn how to combine information from multiple time variables, like calculating time elapsed between variables and also, importantly, working with variables in different time zones.

⦁ Chapter 8, Performing Feature Scaling, covers the methods that we can use to put the variables within the same scale. We'll also learn how to standardize variables, how to scale to minimum and maximum value, how to do mean normalization or scale to vector norm, among other techniques.

⦁ Chapter 9, Applying Mathematical Computations to Features, explains how to create new variables from existing ones by utilizing different mathematical computations. We'll learn how to create new features through the addition/difference/multiplication/division of existing variables and more. We will also learn how to expand the feature space with polynomial expansion and how to combine features using decision trees.

⦁ Chapter 10, Creating Features with Transactional and Time Series Data, covers how to create static features from transactional information, so that we obtain a static view of a customer, or client, at any point in time. We'll learn how to combine features using math operations, across transactions, in specific time windows and capture time between transactions. We'll also discuss how to determine time between special events. We'll briefly dive into signal processing and learn how to determine and quantify local maxima and local minima.

⦁ Chapter 11, Extracting Features from Text Variables, explains how to derive features from text variables. We'll learn to create new features through the addition of existing variables. We will learn how to capture the complexity of the text by capturing the number of characters, words, sentences, the vocabulary and the lexical variety. We will also learn how to create Bag of Words and how to implement TF-IDF with and without n-grams

작가 소개

▶About the Author
- Soledad Galli
Soledad Galli is a lead data scientist with more than 10 years of experience in world-class academic institutions and renowned businesses. She has researched, developed, and put into production machine learning models for insurance claims, credit risk assessment, and fraud prevention. Soledad received a Data Science Leaders' award in 2018 and was named one of LinkedIn's voices in data science and analytics in 2019. She is passionate about enabling people to step into and excel in data science, which is why she mentors data scientists and speaks at data science meetings regularly. She also teaches online courses on machine learning in a prestigious Massive Open Online Course platform, which have reached more than 10,000 students worldwide.

리뷰

0.0

구매자 별점

0명 평가

별점 분포 보기

이 작품을 평가해 주세요!

리뷰 작성 유의사항

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.

타인에게 불쾌감을 주는 욕설
비속어나 타인을 비방하는 내용
특정 종교, 민족, 계층을 비방하는 내용
해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
의미를 알 수 없는 내용
광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
저작권상 문제의 소지가 있는 내용
다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용

* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.

이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.

아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!

구매자 표시 기준은 무엇인가요?

'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.

무료 작품 (프로모션 등으로 무료로 전환된 작품 포함): '구매자'로 표시되지 않습니다.
시리즈 내 무료 작품: '구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제: 작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소: '구매자' 표시가 자동으로 사라집니다.