본문 바로가기

리디 접속이 원활하지 않습니다.
강제 새로 고침(Ctrl + F5)이나 브라우저 캐시 삭제를 진행해주세요.
계속해서 문제가 발생한다면 리디 접속 테스트를 통해 원인을 파악하고 대응 방법을 안내드리겠습니다.
테스트 페이지로 이동하기

Natural Language Processing and Computational Linguistics 상세페이지

Natural Language Processing and Computational Linguistics

A practical guide to text analysis with Python, Gensim, spaCy, and Keras

  • 관심 0
소장
전자책 정가
19,000원
판매가
19,000원
출간 정보
  • 2018.06.29 전자책 출간
듣기 기능
TTS(듣기) 지원
파일 정보
  • PDF
  • 298 쪽
  • 7.7MB
지원 환경
  • PC뷰어
  • PAPER
ISBN
9781788837033
ECN
-
Natural Language Processing and Computational Linguistics

작품 정보

▶Book Description
Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data.

This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.

You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning.

This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

▶What You Will Learn
⦁ Why text analysis is important in our modern age
⦁ Understand NLP terminology and get to know the Python tools and datasets
⦁ Learn how to pre-process and clean textual data
⦁ Convert textual data into vector space representations
⦁ Using spaCy to process text
⦁ Train your own NLP models for computational linguistics
⦁ Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
⦁ Employ deep learning techniques for text analysis using Keras

▶Key Features
⦁ Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
⦁ Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
⦁ Learn deep learning techniques for text analysis

▶Who This Book Is For
Fluency in Python is assumed, but the book attempts to be accessible to even Python beginners. Basic statistics is helpful. Given that this book introduces Natural Language Processing from first principles, it helps, although it is not a requirement, to be familiar with basic linguistics.

▶What this book covers
⦁ Chapter 1, What is Text Analysis? There is no time like now to do text analysis - we have an abundance of easily available data, powerful and free open source tools to conduct our analysis and research on Machine Learning, Computational Linguistics, and computing with text is progressing at a pace we have not seen before. In this chapter, we will go into details about what exactly text analysis is, and the motivations for studying and understanding text analysis.

⦁ Chapter 2, Python Tips for Text Analysis. We mentioned in Chapter 1, What is Text Analysis, that we will be using Python throughout the book because it is an easy-to-use and powerful language. In this chapter, we will substantiate these claims, while also providing a revision course in basic Python for text analysis. Why is this important? While we expect readers of the book to have a background in Python and high-school math, it is still possible that it’s been a while since you’ve written Python code - and even if you have, Python code you write during text analysis and string manipulation is quite different from, say, building a website using the web framework Django.

⦁ Chapter 3, spaCy’s Language Models. While we introduced text analysis in the previous chapter, we did not discuss any of the technical details behind building a text analysis pipeline. In this chapter, we will introduce you to spaCy’s Language Model - these will serve as the first step in text analysis, and are the first building block in our pipelines. Also, we will introduce the reader to spaCy and how we can use spaCy to help us in our text analysis tasks, as well as talk about some of it’s more powerful functionalities, such as POStagging and NER-tagging. We will finish up with an example of how we can preprocess data quickly and efficiently using spaCy.

⦁ Chapter 4, Gensim –Vectorizing Text and Transformations and n-grams. While we have worked with raw textual data so far, any Machine Learning or information retrieval related algorithm will not accept data like this - which is why we use mathematical constructs called Vectors to help let the algorithms make sense of the text. We will introduce gensim as the tool to conduct this transformation, as well as scikit-learn, which will be used before we plug in the text to any sort of further analysis. A huge part of preprocessing is carried on over when we start our vectorization - bi-grams, tri-grams, and n-grams, as well using term frequencies to get rid of some words which we deem to not be useful.

⦁ Chapter 5, POS-Tagging and Its Applications. Chapters 1 and 2 introduced text analysis and Python, and chapters 3 and 4 helped us set-up our code for more advanced text analysis. This chapter discusses the first of such advanced techniques - part of speech tagging, popularly called POS-tagging. We will study what parts of speech exist, how to identify them in our documents, and what possible uses these POS-tags have.

⦁ Chapter 6, NER-Tagging and Its Applications. In the previous chapter, we saw how we can use spaCy’s language pipeline - POS-tagging is a very powerful tool, and we will now explore itsanother interesting usage, NER-tagging. We will discuss what exactly this is from a both linguistic and text analysis point of view, as well as detailing examples of its usage, and how to train our own NER-tagger with spaCy.

⦁ Chapter 7, Dependency Parsing. We saw in Chapters 5 and 6 how spaCy’s language pipeline performs a variety of complex Computational Linguistics algorithms, such as POS-tagging and NER-tagging. This isn’t all spaCy packs though, and in this chapter we will explore the power of dependency parsing and how it can be used in a variety of contexts and applications. We will have a look at the theory of dependency parsing before moving on to using it with spaCy, as well as training our own dependency parsers.

⦁ Chapter 8, Topic Models. Until now, we dealt with Computational Linguistics algorithms and spaCy, and understood how to use these computational linguistic algorithms to annotate our data, as well as understand sentence structure. While these algorithms helped us understand the finer details of our text, we still didn’t get a big picture of our data - what kind of words appear more often than others in our corpus? Can we group our data or find underlying themes? We will be attempting to answer these questions and more in this chapter.

⦁ Chapter 9, Advanced Topic Modeling. We saw in the previous chapter the power of topic modeling, and how intuitive a way it can be to understand our data, as well as explore it. In this chapter, we will further explore the utility of these topic models, and also on how to create more useful topic models which better encapsulate the topics that may be present in a corpus. Since topic modeling is a way to understand the documents of a corpus, it also means that we can analyze documents in ways we have not done before.

⦁ Chapter 10, Clustering and Classifying Text. In the previous chapter we studied topic models and how they can help us in organizing and better understanding our documents and its sub-structure. We will now move on to our next set of Machine Learning algorithms, and for two particular tasks - clustering and classification. We will learn what is the intuitive reasoning of these two tasks, as well as how to perform these tasks using the popular Python Machine Learning library, scikit-learn.

⦁ Chapter 11, Similarity Queries and Summarization. Once we have begun to represent text documents in the form of vector representations, it is possible to start finding the similarity or distance between documents - and that is exactly what we will learn about in this chapter. We are now aware of a variety of different vector representations, from standard bag-of-words or TF-IDF to topic model representations of text documents. We will also learn about a very useful feature implemented in gensim and how to use it - summarization and keyword extraction.

⦁ Chapter 12, Word2Vec, Doc2Vec and Gensim. We previously talked about vectors a lot throughout the book - they are used to understand and represent our textual data in a mathematical form, and the basis of all the Machine Learning methods we use rely on these representations. We will be taking this one step further, and use Machine Learning techniques to generate vector representations of words which better encapsulate the meaning of a word. This technique is generally referred to as word embeddings, and Word2Vec and Doc2Vec are two popular variations of these.

⦁ Chapter 13, Deep Learning for Text. Until now, we have explored the usage of Machine Learning for text in a variety of contexts - topic modelling, clustering, classification, text summarisation, and even our POS-taggers and NER-taggers were trained using Machine Learning. In this chapter, we will begin to explore one of the most cutting-edge forms of Machine Learning - Deep Learning. Deep Learning is a form of ML where we use biologically inspired structures to generate algorithms and architectures to perform various tasks on text. Some of these tasks are text generation, classification, and word embeddings. In this chapter, we will discuss some of the underpinnings of Deep Learning as well as how to implement our own Deep Learning models for text.

⦁ Chapter 14, Keras and spaCy for Deep Learning. In the previous chapter, we introduced Deep Learning techniques for text, and to get a taste of using Neural Networks, we attempted to generate text using an RNN. In this chapter, we will take a closer look at Deep Learning for text, and in particular, how to set up a Keras model which can perform classification, as well as how to incorporate Deep Learning into spaCy pipelines.

⦁ Chapter 15, Sentiment Analysis and ChatBots. By now, we are equipped with the skills needed to get started on text analysis projects, and to also take a shot at more complicated, meatier projects. Two common text analysis projects which encapsulate a lot of the concepts we have explored throughout the book are sentiment analysis and chatbots. In fact, we’ve already touched upon all the methods we will be using for these projects, and this chapter will serve as a guide to how one can put up such an application on their own. In this chapter, we will not be providing the code to build a chatbot or sentiment analysis pipeline from the first step to the last, but will rather introduce the reader to a variety of techniques that will help when setting up such a project.

작가 소개

⦁ Bhargav Srinivasa-Desikan
Bhargav Srinivasa-Desikan is a research engineer working for INRIA in Lille, France. He is a part of the MODAL (Models of Data Analysis and Learning) team, and has a deep interest in modern text analysis. He works on metric learning, predictor aggregation, and data visualization. He is a regular contributor to the Python open source community, and completed Google Summer of Code in 2016 with Gensim where he implemented Dynamic Topic Models. He is a regular speaker at PyCons and PyDatas across Europe and Asia, and conducts tutorials on text analysis using Python.

리뷰

0.0

구매자 별점
0명 평가

이 작품을 평가해 주세요!

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.
  1. 타인에게 불쾌감을 주는 욕설
  2. 비속어나 타인을 비방하는 내용
  3. 특정 종교, 민족, 계층을 비방하는 내용
  4. 해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
  5. 의미를 알 수 없는 내용
  6. 광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
  7. 저작권상 문제의 소지가 있는 내용
  8. 다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용
* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.
이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.
아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!
'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.
무료 작품 (프로모션 등으로 무료로 전환된 작품 포함)
'구매자'로 표시되지 않습니다.
시리즈 내 무료 작품
'구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제
작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소
'구매자' 표시가 자동으로 사라집니다.

개발/프로그래밍 베스트더보기

  • 핸즈온 LLM (제이 알아마르, 마르턴 흐루턴도르스트)
  • 조코딩의 AI 비트코인 자동 매매 시스템 만들기 (조동근)
  • 모던 소프트웨어 엔지니어링 (데이비드 팔리, 박재호)
  • 요즘 우아한 AI 개발 (우아한형제들)
  • 주니어 백엔드 개발자가 반드시 알아야 할 실무 지식 (최범균)
  • 개정판 | 혼자 공부하는 머신러닝+딥러닝 (박해선)
  • 개정4판 | 스위프트 프로그래밍 (야곰)
  • 웹 접근성 바이블 (이하라 리키야, 고바야시 다이스케)
  • Do it! LLM을 활용한 AI 에이전트 개발 입문 (이성용)
  • 혼자 공부하는 네트워크 (강민철)
  • 컴퓨터 밑바닥의 비밀 (루 샤오펑, 김진호)
  • 7가지 프로젝트로 배우는 LLM AI 에이전트 개발 (황자, 김진호)
  • 러닝 랭체인 (메이오 오신, 누노 캄포스)
  • LLM 엔지니어링 (막심 라본, 폴 이우수틴)
  • 멀티패러다임 프로그래밍 (유인동)
  • LLM 서비스 설계와 최적화 (슈레야스 수브라마니암, 김현준)
  • 이펙티브 소프트웨어 설계 (토마스 레렉, 존 스키트)
  • 테스트 너머의 QA 엔지니어링 (김명관)
  • 혼자 공부하는 컴퓨터 구조+운영체제 (강민철)
  • 기획자로 산다는 것 (카카)

본문 끝 최상단으로 돌아가기

spinner
앱으로 연결해서 다운로드하시겠습니까?
닫기 버튼
대여한 작품은 다운로드 시점부터 대여가 시작됩니다.
앱으로 연결해서 보시겠습니까?
닫기 버튼
앱이 설치되어 있지 않으면 앱 다운로드로 자동 연결됩니다.
모바일 버전