Name: Hands-On Data Science with R
Price: 17000 KRW
Availability: OnlineOnly
Author: Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias

Hands-On Data Science with R 상세페이지

출간 정보

2018.11.30 전자책 출간

듣기 기능

TTS(듣기) 미지원

파일 정보

PDF
414 쪽
26.6MB

지원 환경

앱
웹
PC뷰어
PAPER

ISBN

9781789135831

UCI

Hands-On Data Science with R

작품 정보

▶Book Description
R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems.

The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data.

Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.

▶What You Will Learn
⦁ Understand the R programming language and its ecosystem of packages for data science
⦁ Obtain and clean your data before processing
⦁ Master essential exploratory techniques for summarizing data
⦁ Examine various machine learning prediction, models
⦁ Explore the H2O analytics platform in R for deep learning
⦁ Apply data mining techniques to available datasets
⦁ Work with interactive visualization packages in R
⦁ Integrate R with Spark and Hadoop for large-scale data analytics

▶Key Features
⦁ Explore the popular R packages for data science
⦁ Use R for efficient data mining, text analytics and feature engineering
⦁ Become a thorough data science professional with the help of hands-on examples and use-cases in R

▶Who This Book Is For
If you are a budding data scientist keen to learn about the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course

▶What this book covers
⦁ Chapter 1, Getting Started with Data Science and R, provides an introduction to the field of data science, its applicability in different industry domains, an overview of the machine learning process, and how to install R Studio in order to get started in R development. It also introduces the reader to programming in R, starting off at an intermediate level to facilitate an analysis of the HDI, published by the UN development program. The HDI signifies the level of economic development, including general public health, education, and various other societal factors, of a state.

⦁ Chapter 2, Descriptive and Inferential Statistics, introduces fundamental statistical analysis using R, including techniques to perform random sampling, hypothesis testing, and nonparametric tests. This chapter contains extensive examples of commands in R for performing common analysis, such as t-tests and z-tests, and includes utilization of some well-known statistical packages, such as HMISC in R.

⦁ Chapter 3, Data Wrangling with R, provides an introduction to packages available in R to slice and manipulate data. Packages that are available as part of the tidyverse set of packages, such as dplyr, and, more generally, the apply family of functions in R, have been introduced. The chapter is example-heavy, in that several examples have been provided to guide the reader on how to apply the functions in the respective packages.

⦁ Chapter 4, KDD, Data Mining, and Text Mining, includes extensive discussions on the art of extracting information from unstructured data sources, such as websites and Twitter. KDD is a popular term in the data science community and this chapter does full justice to the topic by providing step-by-step examples so as to provide a holistic overview of the subject matter. Sections on web scraping, data transformation, and data visualization have been included. Examples on how to leverage packages such as rvest and httr in order to perform such operations are also discussed at length.

⦁ Chapter 5, Data Analysis with R, covers a general introduction to data types and data categories in R as they apply to machine learning, manipulating strings and dates, and charting with R. This chapter is essentially a consolidation of topics that are found elsewhere in the book, but in a more concise format. This chapter can hence be used as a standalone section of the book that does not depend on any other chapter and can be used to gain familiarity with the topics discussed.

⦁ Chapter 6, Machine Learning with R, provides a detailed overview of using R for predictive analytics, more generally known as machine learning. It starts out with linear regression, and gradually progresses to more in-depth topics in ML such as decision trees, random forest, and SVMs. Extensively worked-out, hands-on examples, along with visualizations, complement the theoretical discussions in this chapter. The chapter concludes with a discussion on neural networks, one of the most popular fields today in machine learning.

⦁ Chapter 7, Forecasting and ML App with R, includes an advanced R Shiny application, full with custom CSS style sheets, Google fonts, modified data table formats, and such like, for forecasting the revenue and sales of pharmaceutical medications in the UK using the NHS dataset. Such datasets are also known as real-world datasets in the sense that they contain actual data pertaining to physicians' prescribing activities. The application is fully reactive; that is, changing the controls on the frontend will immediately run the respective forecasting algorithm and update forecast tables. We have also used an algorithm known as Markov Chain Monte Carlo, which is a machine learning-based forecasting model provided as part of the Facebook package, Prophet.

⦁ Chapter 8, Neural Networks and Deep Learning, initiates a comprehensive discussion, along with hands-on examples, of using R for machine learning using two of the most popular algorithms—neural networks, and its more advanced variation, deep learning. Indeed, some of the most successful machine learning projects in the world today, such as selfdriving cars and automated assistants such as Siri, are powered by deep learning. This chapter gives readers a unique and robust opportunity to delve into these areas and learn how they, too, can apply some of the same algorithms driving sensational successes in the field of machine learning today.

⦁ Chapter 9, Markovian in R, applies to more advanced users who are interested in learning more about Markov processes that involve finding latent (or hidden) data from information in datasets. This is essentially a part of a field known as Bayesian analysis, which allows machine learning practitioners to model states that are not directly visible. Markov models are used in fields such as natural language processing, and object recognition.

⦁ Chapter 10, Visualizing Data, provides a comprehensive introduction to various plotting libraries in R. In particular, libraries such as ggplot2, rCharts, and mapping libraries have been discussed at length. R is well known for its presentation-grade libraries that are capable of creating stunning, professional-grade visualizations. The chapter walks the reader through many of the plotting libraries that have made R a mainstay of the data visualization field.

⦁ Chapter 11, Going to Production with R, provides an introduction to the Shiny R package, a tool for the development of interactive applications. This chapter delves into how it works, how reactivity works, the basics of its template, how to build a basic application, and how to build one using a real dataset. If you want a package to present your data to people who are unfamiliar with the R language, maybe you should start by learning the Shiny App.

⦁ Chapter 12, Large Scale Data Analytics with Hadoop, covers Apache Spark, an engine for large-scale data processing, similar but not identical to Apache Hadoop. Since its focus is on processing, you can use it entirely from your RStudio console. This chapter teaches how to install and take your first steps on it with sparklyr, an R package that provides a backend to the dplyr package. In this way, you can use the dplyr functions to manipulate your big dataset into the Spark cluster.

⦁ Chapter 13, R on Cloud, takes an in-depth look at using AzureML on the Microsoft Azure (cloud) platform. Cloud computing has allowed companies across the world to transition from a traditional data center-oriented architecture to a cloud-based decentralized environment. Unsurprisingly, machine learning has become a major part of the success of the cloud due to the ease of deploying multi-node clusters for large-scale machine learning. AzureML is an easy-to-use web-based platform from Microsoft that allows even new data scientists to get a jump start on machine learning via a GUI-based interface.

⦁ Appendix A, The Road Ahead, introduces the reader to various resources on the web, such as blogs and forums to utilize and learn more about the field of R. The world of R is rapidly evolving, and in this chapter, we share some insights on the specific resources that will help seasoned data scientists stay abreast of all the developments in R today.

작가 소개

⦁ Vitor Bianchi Lanzetta
Vitor Bianchi Lanzetta (@vitorlanzetta) has a master's degree in Applied Economics (University of São Paulo—USP) and works as a data scientist in a tech start-up named RedFox Digital Solutions. He has also authored a book called R Data Visualization Recipes. The things he enjoys the most are statistics, economics, and sports of all kinds (electronics included). His blog, made in partnership with Ricardo Anjoleto Farias (@R_A_Farias), can be found at ArcadeData dot org, they kindly call it R-Cade Data.

⦁ Nataraj Dasgupta
Nataraj Dasgupta is the vice president of advanced analytics at RxDataScience Inc. Nataraj has been in the IT industry for more than 19 years, and has worked in the technical and analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. At Purdue Pharma, Nataraj led the data science division, where he developed the company's award-winning big data and machine learning platform. Prior to Purdue, at UBS, he held the role of Associate Director, working with high-frequency and algorithmic trading technologies in the foreign exchange trading division of the bank.

⦁ Ricardo Anjoleto Farias
Ricardo Anjoleto Farias is an economist who graduated from the Universidade Estadual de Maringá in 2014. In addition to being a sports enthusiast (electronic or otherwise) and enjoying a good barbecue, he also likes math, statistics, and correlated studies. His first contact with R was when he embarked on his master's degree, and since then, he has tried to improve his skills with this powerful tool.

리뷰

0.0

구매자 별점

0명 평가

별점 분포 보기

이 작품을 평가해 주세요!

리뷰 작성 유의사항

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.

타인에게 불쾌감을 주는 욕설
비속어나 타인을 비방하는 내용
특정 종교, 민족, 계층을 비방하는 내용
해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
의미를 알 수 없는 내용
광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
저작권상 문제의 소지가 있는 내용
다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용

* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.

이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.

아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!

구매자 표시 기준은 무엇인가요?

'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.

무료 작품 (프로모션 등으로 무료로 전환된 작품 포함): '구매자'로 표시되지 않습니다.
시리즈 내 무료 작품: '구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제: 작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소: '구매자' 표시가 자동으로 사라집니다.