본문 바로가기

리디 접속이 원활하지 않습니다.
강제 새로 고침(Ctrl + F5)이나 브라우저 캐시 삭제를 진행해주세요.
계속해서 문제가 발생한다면 리디 접속 테스트를 통해 원인을 파악하고 대응 방법을 안내드리겠습니다.
테스트 페이지로 이동하기

Mastering Hadoop 3 상세페이지

Mastering Hadoop 3

Big data processing at scale to unlock unique business insights

  • 관심 1
소장
전자책 정가
26,000원
판매가
26,000원
출간 정보
  • 2019.02.28 전자책 출간
듣기 기능
TTS(듣기) 지원
파일 정보
  • PDF
  • 531 쪽
  • 11.9MB
지원 환경
  • PC뷰어
  • PAPER
ISBN
9781788628327
UCI
-
Mastering Hadoop 3

작품 정보

▶Book Description
Apache Hadoop is one of the most popular big data solutions for distributed storage and for processing large chunks of data. With Hadoop 3, Apache promises to provide a high-performance, more fault-tolerant, and highly efficient big data processing platform, with a focus on improved scalability and increased efficiency.

With this guide, you’ll understand advanced concepts of the Hadoop ecosystem tool. You’ll learn how Hadoop works internally, study advanced concepts of different ecosystem tools, discover solutions to real-world use cases, and understand how to secure your cluster. It will then walk you through HDFS, YARN, MapReduce, and Hadoop 3 concepts. You’ll be able to address common challenges like using Kafka efficiently, designing low latency, reliable message delivery Kafka systems, and handling high data volumes. As you advance, you’ll discover how to address major challenges when building an enterprise-grade messaging system, and how to use different stream processing systems along with Kafka to fulfil your enterprise goals.

By the end of this book, you’ll have a complete understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable data pipeline, and you’ll be equipped to tackle a range of real-world problems in data pipelines.

▶What You Will Learn
⦁ Gain an in-depth understanding of distributed computing using Hadoop 3
⦁ Develop enterprise-grade applications using Apache Spark, Flink, and more
⦁ Build scalable and high-performance Hadoop data pipelines with security, monitoring, and data governance
⦁ Explore batch data processing patterns and how to model data in Hadoop
⦁ Master best practices for enterprises using, or planning to use, Hadoop 3 as a data platform
⦁ Understand security aspects of Hadoop, including authorization and authentication

▶Key Features
⦁ Get to grips with the newly introduced features and capabilities of Hadoop 3
⦁ Crunch and process data using MapReduce, YARN, and a host of tools within the Hadoop ecosystem
⦁ Sharpen your Hadoop skills with real-world case studies and code

▶Who This Book Is For
If you want to become a big data professional by mastering the advanced concepts of Hadoop, this book is for you. You’ll also find this book useful if you’re a Hadoop professional looking to strengthen your knowledge of the Hadoop ecosystem. Fundamental knowledge of the Java programming language and basics of Hadoop is necessary to get started with this book.

▶What this book covers
⦁ Chapter 1, Journey to Hadoop 3, introduces the main concepts of Hadoop and outlines its origin. It further focuses on the features of Hadoop 3. This chapter also provides a logical overview of the Hadoop ecosystem and different Hadoop distributions.

⦁ Chapter 2, Deep Dive into the Hadoop Distributed File System, focuses on the Hadoop Distributed File System and its internal concepts. It also covers HDFS operations in depth, and introduces you to the new functionality added to the HDFS in Hadoop 3, along with covering HDFS caching and HDFS Federation in detail.

⦁ Chapter 3, YARN Resource Management in Hadoop, introduces you to the resource management framework of YARN. It focuses on efficient scheduling of jobs submitted to YARN and provides a brief overview of the pros and cons of the scheduler available in YARN. It also focuses on the YARN features introduced in Hadoop 3, especially the YARN REST API. It also covers the architecture and internals of Apache Slider. It then focuses on Apache Tez, a distributed processing engine, which helps us to optimize applications running on YARN.

⦁ Chapter 4, Internals of MapReduce, introduces a distributed batch processing engine known as Map Reduce. It covers some of the internal concepts of Map Reduce and walks you through each step in detail. It then focuses on a few important parameters and some common patterns in Map Reduce.

⦁ Chapter 5, SQL on Hadoop, covers a few important SQL-like engines present in the Hadoop ecosystem. It starts with the details of the architecture of Presto and then covers some examples with a few popular connectors. It then covers the popular query engine, Hive, and focuses on its architecture and a number of advanced-level concepts. Finally, it covers Impala, a fast processing engine, and its internal architectural concepts in detail.

⦁ Chapter 6, Real-Time Processing Engines, focuses on different engines available for processing, discussing each processing engine individually. It includes details on the internal workings of Spark Framework and the concept of Resilient Distributed Datasets(RDDs). An introduction to the internals of Apache Flink and Apache Storm/Heron are also focal points of this chapter.

⦁ Chapter 7, Widely Used Hadoop Ecosystem Components, introduces you to a few important tools used on the Hadoop platform. It covers Apache Pig, used for ETL operations, and introduces you to a few of the internal concepts of its architecture and operations. It takes you through the details of Apache Kafka and Apache Flume. Apache HBase is also a primary focus of this chapter.

⦁ Chapter 8, Designing Applications in Hadoop, starts with a few advanced-level concepts related to file formats. It then focuses on data compression and serialization concepts in depth, before covering concepts of data processing and data access and moving to use case examples.

⦁ Chapter 9, Real-Time Stream Processing in Hadoop, is focused on designing and implementing real-time and microbatch-oriented applications in Hadoop. This chapter covers how to perform stream data ingestion, along with the role of message queues. It further penetrates some of common stream data-processing patterns, along with low latency design considerations. It elaborates on these concepts with real-time and microbatch case studies.

⦁ Chapter 10, Machine Learning in Hadoop, covers how to design and architect machine learning applications on the Hadoop platform. It addresses some of the common machine learning challenges that you can face in Hadoop, and how to solve those. It walks through different machine learning libraries and processing engines. It covers some of the common steps involved in machine learning and further elaborates on this with a case study.

⦁ Chapter 11, Hadoop in the Cloud, provides an overview of Hadoop operations in the cloud. It covers detailed information on how the Hadoop ecosystem looks in the cloud, how we should manage resources in the cloud, how we create a data pipeline in the cloud, and how we can ensure high availability across the cloud.

⦁ Chapter 12, Hadoop Cluster Profiling, covers tools and techniques for benchmarking and profiling the Hadoop cluster. It also examines aspects of profiling different Hadoop workloads.

⦁ Chapter 13, Who Can Do What in Hadoop, is about securing a Hadoop cluster. It covers the basics of Hadoop security. It further focuses on implementing and designing Hadoop authentication and authorization.

⦁ Chapter 14, Network and Data Security, is an extension to the previous chapter, covering some advanced concepts in Hadoop network and data security. It covers advanced concepts, such as network segmentation, perimeter security, and row/column level security. It also covers encrypting data in motion and data at rest in Hadoop.

⦁ Chapter 15, Monitoring Hadoop, covers the fundamentals of monitoring Hadoop. The chapter is divided into two major sections. One section concerns general Hadoop monitoring, and the remainder of the chapter discusses specialized monitoring for identifying security breaches.

작가 소개

⦁ Chanchal Singh
Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.

⦁ Manish Kumar
Manish Kumar is a technical architect at DataMetica Solution Pvt. Ltd. He has approximately 11 years' experience in data management, working as a data architect and product architect. He has extensive experience of building effective ETL pipelines, implementing security over Hadoop, and providing the best possible solutions to data science problems. Before joining the world of big data, he worked as a tech lead for Sears Holding, India. Manish has a bachelor's degree in information technology, and he is a coauthor of Building Data Streaming Applications with Apache Kafka.

리뷰

0.0

구매자 별점
0명 평가

이 작품을 평가해 주세요!

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.
  1. 타인에게 불쾌감을 주는 욕설
  2. 비속어나 타인을 비방하는 내용
  3. 특정 종교, 민족, 계층을 비방하는 내용
  4. 해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
  5. 의미를 알 수 없는 내용
  6. 광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
  7. 저작권상 문제의 소지가 있는 내용
  8. 다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용
* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.
이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.
아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!
'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.
무료 작품 (프로모션 등으로 무료로 전환된 작품 포함)
'구매자'로 표시되지 않습니다.
시리즈 내 무료 작품
'구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제
작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소
'구매자' 표시가 자동으로 사라집니다.

개발/프로그래밍 베스트더보기

  • AI 엔지니어링 (칩 후옌, 변성윤)
  • 요즘 개발자를 위한 시스템 설계 수업 (디렌드라 신하 , 테자스 초프라)
  • 밑바닥부터 만들면서 배우는 LLM (세바스찬 라시카, 박해선)
  • 0과 1 사이 (가와타 아키라, 고이케 유키)
  • 요즘 바이브 코딩 클로드 코드 완벽 가이드 (최지호(코드팩토리))
  • 실무로 통하는 LLM 애플리케이션 설계 (수하스 파이, 박조은)
  • AI 에이전트 생태계 (이주환)
  • 한 걸음 앞선 개발자가 지금 꼭 알아야 할 클로드 코드 (조훈, 정찬훈)
  • 주니어 백엔드 개발자가 반드시 알아야 할 실무 지식 (최범균)
  • 데이터 삽질 끝에 UX가 보였다 (이미진(란란))
  • SQLite, MCP, 바이브 코딩을 활용한 데이터 분석과 업무 자동화 (박찬규, 윤가희)
  • 그림으로 쉽게 배우는 HTML+CSS+자바스크립트 (임지영)
  • 개정판 | 프롬프트 엔지니어링 (반병현)
  • 요즘 바이브 코딩 커서 AI 30가지 프로그램 만들기 (박현규)
  • 소문난 명강의 : 크리핵티브의 한 권으로 끝내는 웹 해킹 바이블 (하동민)
  • 헤드 퍼스트 소프트웨어 아키텍처 (라주 간디, 마크 리처드)
  • n8n 첫걸음 업무 자동화 부터 AI 챗봇 까지 (문세환)
  • 밑바닥부터 시작하는 웹 브라우저 (파벨 판체카, 크리스 해럴슨)
  • 개정판 | 개발자 기술 면접 노트 (이남희)
  • 데이터 중심 애플리케이션 설계 (마틴 클레프만, 정재부)

본문 끝 최상단으로 돌아가기

spinner
앱으로 연결해서 다운로드하시겠습니까?
닫기 버튼
대여한 작품은 다운로드 시점부터 대여가 시작됩니다.
앱으로 연결해서 보시겠습니까?
닫기 버튼
앱이 설치되어 있지 않으면 앱 다운로드로 자동 연결됩니다.
모바일 버전