▶Book Description
Apache Hadoop is the most popular platform for big data processing, and can be combined with a host of other big data tools to build powerful analytics solutions. Big Data Analytics with Hadoop 3 shows you how to do just that, by providing insights into the software as well as its benefits with the help of practical examples.
Once you have taken a tour of Hadoop 3's latest features, you will get an overview of HDFS, MapReduce, and YARN, and how they enable faster, more efficient big data processing. You will then move on to learning how to integrate Hadoop with the open source tools, such as Python and R, to analyze and visualize data and perform statistical computing on big data. As you get acquainted with all this, you will explore how to use Hadoop 3 with Apache Spark and Apache Flink for real-time data analytics and stream processing. In addition to this, you will understand how to use Hadoop to build analytics solutions on the cloud and an end-to-end pipeline to perform big data analysis using practical use cases.
By the end of this book, you will be well-versed with the analytical capabilities of the Hadoop ecosystem. You will be able to build powerful solutions to perform big data analytics and get insight effortlessly.
▶What You Will Learn
⦁ Explore the new features of Hadoop 3 along with HDFS, YARN, and MapReduce
⦁ Get well-versed with the analytical capabilities of Hadoop ecosystem using practical examples
⦁ Integrate Hadoop with R and Python for more efficient big data processing
⦁ Learn to use Hadoop with Apache Spark and Apache Flink for real-time data analytics
⦁ Set up a Hadoop cluster on AWS cloud
⦁ Perform big data analytics on AWS using Elastic Map Reduce
▶Key Features
⦁ Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud
⦁ Integrate Hadoop with other big data tools such as R, Python, Apache Spark, and Apache Flink
⦁ Exploit big data using Hadoop 3 with real-world examples
▶Who This Book Is For
Big Data Analytics with Hadoop 3 is for you if you are looking to build high-performance analytics solutions for your enterprise or business using Hadoop 3's powerful features, or you're new to big data analytics. A basic understanding of the Java programming language is required.
▶What this book covers
⦁ Chapter 1, Introduction to Hadoop, introduces you to the world of Hadoop and its core components, namely, HDFS and MapReduce.
⦁ Chapter 2, Overview of Big Data Analytics, introduces the process of examining large datasets to uncover patterns in data, generating reports, and gathering valuable insights.
⦁ Chapter 3, Big Data Processing with MapReduce, introduces the concept of MapReduce, which is the fundamental concept behind most of the big data computing/processing systems.
⦁ Chapter 4, Scientific Computing and Big Data Analysis with Python and Hadoop, provides an introduction to Python and an analysis of big data using Hadoop with the aid of Python packages.
⦁ Chapter 5, Statistical Big Data Computing with R and Hadoop, provides an introduction to R and demonstrates how to use R to perform statistical computing on big data using Hadoop.
⦁ Chapter 6, Batch Analytics with Apache Spark, introduces you to Apache Spark and demonstrates how to use Spark for big data analytics based on a batch processing model.
⦁ Chapter 7, Real-Time Analytics with Apache Spark, introduces the stream processing model of Apache Spark and demonstrates how to build streaming-based, real-time analytical applications.
⦁ Chapter 8, Batch Analytics with Apache Flink, covers Apache Flink and how to use it for big data analytics based on a batch processing model.
⦁ Chapter 9, Stream Processing with Apache Flink, introduces you to DataStream APIs and stream processing using Flink. Flink will be used to receive and process real-time event streams and store the aggregates and results in a Hadoop cluster.
⦁ Chapter 10, Visualizing Big Data, introduces you to the world of data visualization using various tools and technologies such as Tableau.
⦁ Chapter 11, Introduction to Cloud Computing, introduces Cloud computing and various concepts such as IaaS, PaaS, and SaaS. You will also get a glimpse into the top Cloud providers.
⦁ Chapter 12, Using Amazon Web Services, introduces you to AWS and various services in AWS useful for performing big data analytics using Elastic Map Reduce (EMR) to set up a Hadoop cluster in AWS Cloud.