Hands-On Data Analysis with Pandas
Efficiently perform data collection, wrangling, analysis, and visualization using Python
- 출간 정보
- 2019.07.26. 전자책 출간
- 파일 정보
<Hands-On Data Analysis with Pandas> ▶Book Description
Data analysis has become a necessary skill in a variety of positions where knowing how to work with data and extract insights can generate significant value.
Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification, using scikit-learn, to make predictions based on past data.
By the end of this book, you will be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets.
▶What You Will Learn
- Understand how data analysts and scientists gather and analyze data
- Perform data analysis and data wrangling in Python
- Combine, group, and aggregate data from multiple sources
- Create data visualizations with pandas, matplotlib, and seaborn
- Apply machine learning (ML) algorithms to identify patterns and make predictions
- Use Python data science libraries to analyze real-world datasets
- Use pandas to solve common data representation and analysis problems
- Build Python scripts, modules, and packages for reusable analysis code
- Perform efficient data analysis and manipulation tasks using pandas
- Apply pandas to different real-world domains using step-by-step demonstrations
- Get accustomed to using pandas as an effective data exploration tool
▶Who This Book Is For
This book is for data analysts, data science beginners, and Python developers who want to explore each stage of data analysis and scientific computing using a wide range of datasets. You will also find this book useful if you are a data scientist who is looking to implement pandas in machine learning. Working knowledge of Python programming language will be beneficial.
▶What this book covers
- Chapter 1, Introduction to Data Analysis, teaches you the fundamentals of data analysis, gives you a foundation in statistics, and guides you through getting your environment set up for working with data in Python and using Jupyter Notebooks.
- Chapter 2, Working with Pandas DataFrames, introduces you to the pandas library and shows you the basics of working with DataFrames.
- Chapter 3, Data Wrangling with Pandas, discusses the process of data manipulation, shows you how to explore an API to gather data, and guides you through data cleaning and reshaping with pandas.
- Chapter 4, Aggregating Pandas DataFrames, teaches you how to query and merge DataFrames, perform complex operations on them, including rolling calculations and aggregations, and how to work effectively with time series data.
- Chapter 5, Visualizing Data with Pandas and Matplotlib, shows you how to create your own data visualizations in Python, first using the matplotlib library, and then from pandas objects directly.
- Chapter 6, Plotting with Seaborn and Customization Techniques, continues the discussion on data visualization by teaching you how to use the seaborn library to visualize your longform data and giving you the tools you need to customize your visualizations, making them presentation-ready.
- Chapter 7, Financial Analysis – Bitcoin and the Stock Market, walks you through the creation of a Python package for analyzing stocks, building upon everything learned from Chapter 1, Introduction to Data Analysis, through Chapter 6, Plotting with Seaborn and Customization Techniques, and applying it to a financial application.
- Chapter 8, Rule-Based Anomaly Detection, covers simulating data and applying everything learned from Chapter 1, Introduction to Data Analysis, through Chapter 6, Plotting with Seaborn and Customization Techniques, to catch hackers attempting to authenticate to a website, using rule-based strategies for anomaly detection.
- Chapter 9, Getting Started with Machine Learning in Python, introduces you to machine learning and building models using the scikit-learn library.
- Chapter 10, Making Better Predictions – Optimizing Models, shows you strategies for tuning and improving the performance of your machine learning models.
- Chapter 11, Machine Learning Anomaly Detection, revisits anomaly detection on login attempt data, using machine learning techniques, all while giving you a taste of how the workflow looks in practice.
- Chapter 12, The Road Ahead, contains resources for taking your skills to the next level and further avenues for exploration.
Data science is often described as an interdisciplinary field where programming skills, statistical know-how, and domain knowledge intersect. It has quickly become one of the hottest fields of our society, and knowing how to work with data has become essential in today's careers. Regardless of the industry, role, or project, data skills are in high demand, and learning data analysis is the key to making an impact.
Fields in data science cover many different aspects of the spectrum: data analysts focus more on extracting business insights, while data scientists focus more on applying machine learning techniques to the business's problems. Data engineers focus on designing, building, and maintaining data pipelines used by data analysts and scientists. Machine learning engineers share much of the skill set of the data scientist and, like data engineers, are adept software engineers. The data science landscape encompasses many fields, but for all of them, data analysis is a fundamental building block. This book will give you the skills to get started, wherever your journey may take you.
The traditional skill set in data science involves knowing how to collect data from various sources, such as databases and APIs, and process it. Python is a popular language for data science that provides the means to collect and process data, as well as to build production quality data products. Since it is open source, it is easy to get started with data science by taking advantage of the libraries written by others to solve common data tasks and issues.
Pandas is the powerful and popular library synonymous with data science in Python. This book will give you a hands-on introduction to data analysis using pandas on real-world datasets, such as those dealing with the stock market, simulated hacking attempts, weather trends, earthquakes, wine, and astronomical data. Pandas makes data wrangling and visualization easy by giving us the ability to work efficiently with tabular data.
Once we have learned how to conduct data analysis, we will explore a number of applications. We will build Python packages and try our hand at stock analysis, anomaly detection, regression, clustering, and classification with the help of additional libraries commonly used for data visualization, data wrangling, and machine learning, such as Matplotlib, Seaborn, NumPy, and Scikit-Learn. By the time you finish this book, you will be well-equipped to take on your own data science projects in Python.
▶About the Author
- Stefanie Molin
Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
▶TABLE of CONTENTS
1 Introduction to Data Analysis
2 Working with Pandas DataFrames
3 Data Wrangling with Pandas
4 Aggregating Pandas DataFrames
5 Visualizing Data with Pandas and Matplotlib
6 Plotting with Seaborn and Customization Techniques
7 Financial Analysis - Bitcoin and the Stock Market
8 Rule-Based Anomaly Detection
9 Getting Started with Machine Learning in Python
10 Making Better Predictions - Optimizing Models
11 Machine Learning Anomaly Detection
12 The Road Ahead
내가 남긴 별점 0.0
'구매자' 표시는 리디북스에서 유료도서 결제 후 다운로드 하시거나 리디셀렉트 도서를 다운로드하신 경우에만 표시됩니다.
본문 끝 최상단으로 돌아가기
사용 가능 : 개
<>부터 총 화
총 화 대여 완료했습니다.
남은 작품 : 총 화 (원)
Hands-On Data Analysis with Pandas
대여 기간 : 일
결제 금액 : 원
결제 가능한 리디캐시, 포인트가 없습니다.
리디캐시를 충전하시면 자동으로 결제됩니다.
최대 9% 리디포인트 적립 혜택도 놓치지 마세요!
이미 구매한 작품입니다.
원하는 결제 방법을 선택해주세요.
대여 기간이 만료되었습니다.