Name: Mastering Hadoop 3
Price: 26000 KRW
Availability: OnlineOnly
Author: Chanchal Singh, Manish Kumar

Mastering Hadoop 3 상세페이지

출간 정보

2019.02.28 전자책 출간

듣기 기능

TTS(듣기) 미지원

파일 정보

PDF
531 쪽
11.9MB

지원 환경

앱
웹
PC뷰어
PAPER

ISBN

9781788628327

UCI

Mastering Hadoop 3

작품 정보

▶Book Description
Apache Hadoop is one of the most popular big data solutions for distributed storage and for processing large chunks of data. With Hadoop 3, Apache promises to provide a high-performance, more fault-tolerant, and highly efficient big data processing platform, with a focus on improved scalability and increased efficiency.

With this guide, you’ll understand advanced concepts of the Hadoop ecosystem tool. You’ll learn how Hadoop works internally, study advanced concepts of different ecosystem tools, discover solutions to real-world use cases, and understand how to secure your cluster. It will then walk you through HDFS, YARN, MapReduce, and Hadoop 3 concepts. You’ll be able to address common challenges like using Kafka efficiently, designing low latency, reliable message delivery Kafka systems, and handling high data volumes. As you advance, you’ll discover how to address major challenges when building an enterprise-grade messaging system, and how to use different stream processing systems along with Kafka to fulfil your enterprise goals.

By the end of this book, you’ll have a complete understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable data pipeline, and you’ll be equipped to tackle a range of real-world problems in data pipelines.

▶What You Will Learn
⦁ Gain an in-depth understanding of distributed computing using Hadoop 3
⦁ Develop enterprise-grade applications using Apache Spark, Flink, and more
⦁ Build scalable and high-performance Hadoop data pipelines with security, monitoring, and data governance
⦁ Explore batch data processing patterns and how to model data in Hadoop
⦁ Master best practices for enterprises using, or planning to use, Hadoop 3 as a data platform
⦁ Understand security aspects of Hadoop, including authorization and authentication

▶Key Features
⦁ Get to grips with the newly introduced features and capabilities of Hadoop 3
⦁ Crunch and process data using MapReduce, YARN, and a host of tools within the Hadoop ecosystem
⦁ Sharpen your Hadoop skills with real-world case studies and code

▶Who This Book Is For
If you want to become a big data professional by mastering the advanced concepts of Hadoop, this book is for you. You’ll also find this book useful if you’re a Hadoop professional looking to strengthen your knowledge of the Hadoop ecosystem. Fundamental knowledge of the Java programming language and basics of Hadoop is necessary to get started with this book.

▶What this book covers
⦁ Chapter 1, Journey to Hadoop 3, introduces the main concepts of Hadoop and outlines its origin. It further focuses on the features of Hadoop 3. This chapter also provides a logical overview of the Hadoop ecosystem and different Hadoop distributions.

⦁ Chapter 2, Deep Dive into the Hadoop Distributed File System, focuses on the Hadoop Distributed File System and its internal concepts. It also covers HDFS operations in depth, and introduces you to the new functionality added to the HDFS in Hadoop 3, along with covering HDFS caching and HDFS Federation in detail.

⦁ Chapter 3, YARN Resource Management in Hadoop, introduces you to the resource management framework of YARN. It focuses on efficient scheduling of jobs submitted to YARN and provides a brief overview of the pros and cons of the scheduler available in YARN. It also focuses on the YARN features introduced in Hadoop 3, especially the YARN REST API. It also covers the architecture and internals of Apache Slider. It then focuses on Apache Tez, a distributed processing engine, which helps us to optimize applications running on YARN.

⦁ Chapter 4, Internals of MapReduce, introduces a distributed batch processing engine known as Map Reduce. It covers some of the internal concepts of Map Reduce and walks you through each step in detail. It then focuses on a few important parameters and some common patterns in Map Reduce.

⦁ Chapter 5, SQL on Hadoop, covers a few important SQL-like engines present in the Hadoop ecosystem. It starts with the details of the architecture of Presto and then covers some examples with a few popular connectors. It then covers the popular query engine, Hive, and focuses on its architecture and a number of advanced-level concepts. Finally, it covers Impala, a fast processing engine, and its internal architectural concepts in detail.

⦁ Chapter 6, Real-Time Processing Engines, focuses on different engines available for processing, discussing each processing engine individually. It includes details on the internal workings of Spark Framework and the concept of Resilient Distributed Datasets(RDDs). An introduction to the internals of Apache Flink and Apache Storm/Heron are also focal points of this chapter.

⦁ Chapter 7, Widely Used Hadoop Ecosystem Components, introduces you to a few important tools used on the Hadoop platform. It covers Apache Pig, used for ETL operations, and introduces you to a few of the internal concepts of its architecture and operations. It takes you through the details of Apache Kafka and Apache Flume. Apache HBase is also a primary focus of this chapter.

⦁ Chapter 8, Designing Applications in Hadoop, starts with a few advanced-level concepts related to file formats. It then focuses on data compression and serialization concepts in depth, before covering concepts of data processing and data access and moving to use case examples.

⦁ Chapter 9, Real-Time Stream Processing in Hadoop, is focused on designing and implementing real-time and microbatch-oriented applications in Hadoop. This chapter covers how to perform stream data ingestion, along with the role of message queues. It further penetrates some of common stream data-processing patterns, along with low latency design considerations. It elaborates on these concepts with real-time and microbatch case studies.

⦁ Chapter 10, Machine Learning in Hadoop, covers how to design and architect machine learning applications on the Hadoop platform. It addresses some of the common machine learning challenges that you can face in Hadoop, and how to solve those. It walks through different machine learning libraries and processing engines. It covers some of the common steps involved in machine learning and further elaborates on this with a case study.

⦁ Chapter 11, Hadoop in the Cloud, provides an overview of Hadoop operations in the cloud. It covers detailed information on how the Hadoop ecosystem looks in the cloud, how we should manage resources in the cloud, how we create a data pipeline in the cloud, and how we can ensure high availability across the cloud.

⦁ Chapter 12, Hadoop Cluster Profiling, covers tools and techniques for benchmarking and profiling the Hadoop cluster. It also examines aspects of profiling different Hadoop workloads.

⦁ Chapter 13, Who Can Do What in Hadoop, is about securing a Hadoop cluster. It covers the basics of Hadoop security. It further focuses on implementing and designing Hadoop authentication and authorization.

⦁ Chapter 14, Network and Data Security, is an extension to the previous chapter, covering some advanced concepts in Hadoop network and data security. It covers advanced concepts, such as network segmentation, perimeter security, and row/column level security. It also covers encrypting data in motion and data at rest in Hadoop.

⦁ Chapter 15, Monitoring Hadoop, covers the fundamentals of monitoring Hadoop. The chapter is divided into two major sections. One section concerns general Hadoop monitoring, and the remainder of the chapter discusses specialized monitoring for identifying security breaches.

작가 소개

⦁ Chanchal Singh
Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.

⦁ Manish Kumar
Manish Kumar is a technical architect at DataMetica Solution Pvt. Ltd. He has approximately 11 years' experience in data management, working as a data architect and product architect. He has extensive experience of building effective ETL pipelines, implementing security over Hadoop, and providing the best possible solutions to data science problems. Before joining the world of big data, he worked as a tech lead for Sears Holding, India. Manish has a bachelor's degree in information technology, and he is a coauthor of Building Data Streaming Applications with Apache Kafka.

리뷰

0.0

구매자 별점

0명 평가

별점 분포 보기

이 작품을 평가해 주세요!

리뷰 작성 유의사항

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.

타인에게 불쾌감을 주는 욕설
비속어나 타인을 비방하는 내용
특정 종교, 민족, 계층을 비방하는 내용
해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
의미를 알 수 없는 내용
광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
저작권상 문제의 소지가 있는 내용
다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용

* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.

이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.

아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!

구매자 표시 기준은 무엇인가요?

'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.

무료 작품 (프로모션 등으로 무료로 전환된 작품 포함): '구매자'로 표시되지 않습니다.
시리즈 내 무료 작품: '구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제: 작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소: '구매자' 표시가 자동으로 사라집니다.