▶Book Description
Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on.
This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks.
By the end of the book, you will be comfortable with performing various statistical computations for data science programmatically.
▶What You Will Learn
- Analyze the transition from a data developer to a data scientist mindset
- Get acquainted with the R programs and the logic used for statistical computations
- Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more
- Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis
- Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks
- Get comfortable with performing various statistical computations for data science programmatically
▶About This Book
- No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs;
- Implement statistics in data science tasks such as data cleaning, mining, and analysis
- Learn all about probability, statistics, numerical computations, and more with the help of R programs
▶Who This Book Is For
This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful.
▶Style and approach
Step by step comprehensive guide with real world examples
▶What this book covers
- Chapter 1: Transitioning from Data Developer to Data Scientist, sets the stage for the transition from data developer to data scientist. You will understand the difference between a developer mindset versus a data scientist mindset, the important difference between the two, and how to transition into thinking like a data scientist.
- Chapter 2: Declaring the Objectives, introduces and explains (from a developer’s perspective) the basic objectives behind statistics for data science and introduces you to the important terms and keys that are used in the field of data science.
- Chapter 3: A Developer's Approach to Data Cleaning, discusses how a developer might understand and approach the topic of data cleaning using common statistical methods.
- Chapter 4: Data Mining and the Database Developer, introduces the developer to mining data using R. You will understand what data mining is, why it is important, and feel comfortable using R for the most common statistical data mining methods: dimensional reduction, frequent patterns, and sequences.
- Chapter 5: Statistical Analysis for the Database Developer, discusses the difference between data analysis or summarization and statistical data analysis and will follow the steps for successful statistical analysis of data, describe the nature of data, explore the relationships presented in data, create a summarization model from data, prove the validity of a model, and employ predictive analytics on a developed model.
- Chapter 6: Database Progression to Database Regression, sets out to define statistical regression concepts and outline how a developer might use regression for simple forecasting and prediction within a typical data development project.
- Chapter 7: Regularization for Database Improvement, introduces the developer to the idea of statistical regularization to improve data models. You will review what statistical regularization is, why it is important, and various statistical regularization methods.
- Chapter 8: Data Development and Assessment, covers the idea of data model assessment and using statistics for assessment. You will understand what statistical assessment is, why it is important, and use R for statistical assessment.
- Chapter 9: Databases and Neural Networks, defines the neural network model and draws from a developer’s knowledge of data models to help understand the purpose and use of neural networks in data science.
- Chapter 10: Boosting and your Database, introduces the idea of using statistical boosting to better understand data in a database.
- Chapter 11: Database Classification using Support Vector Machines, uses developer terminologies to define an SVM, identify various applications for its use and walks through an example of using a simple SVM to classify data in a database
- Chapter 12: Database Structures and Machine Learning, aims to provide an explanation of the types of machine learning and shows the developer how to use machine learning processes to understand database mappings and identify patterns within the data.