본문 바로가기

리디 접속이 원활하지 않습니다.
강제 새로 고침(Ctrl + F5)이나 브라우저 캐시 삭제를 진행해주세요.
계속해서 문제가 발생한다면 리디 접속 테스트를 통해 원인을 파악하고 대응 방법을 안내드리겠습니다.
테스트 페이지로 이동하기

Natural Language Processing and Computational Linguistics 상세페이지

Natural Language Processing and Computational Linguistics

A practical guide to text analysis with Python, Gensim, spaCy, and Keras

  • 관심 0
소장
전자책 정가
19,000원
판매가
19,000원
출간 정보
  • 2018.06.29 전자책 출간
듣기 기능
TTS(듣기) 지원
파일 정보
  • PDF
  • 298 쪽
  • 7.7MB
지원 환경
  • PC뷰어
  • PAPER
ISBN
9781788837033
UCI
-
Natural Language Processing and Computational Linguistics

작품 정보

▶Book Description
Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data.

This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.

You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning.

This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

▶What You Will Learn
⦁ Why text analysis is important in our modern age
⦁ Understand NLP terminology and get to know the Python tools and datasets
⦁ Learn how to pre-process and clean textual data
⦁ Convert textual data into vector space representations
⦁ Using spaCy to process text
⦁ Train your own NLP models for computational linguistics
⦁ Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
⦁ Employ deep learning techniques for text analysis using Keras

▶Key Features
⦁ Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
⦁ Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
⦁ Learn deep learning techniques for text analysis

▶Who This Book Is For
Fluency in Python is assumed, but the book attempts to be accessible to even Python beginners. Basic statistics is helpful. Given that this book introduces Natural Language Processing from first principles, it helps, although it is not a requirement, to be familiar with basic linguistics.

▶What this book covers
⦁ Chapter 1, What is Text Analysis? There is no time like now to do text analysis - we have an abundance of easily available data, powerful and free open source tools to conduct our analysis and research on Machine Learning, Computational Linguistics, and computing with text is progressing at a pace we have not seen before. In this chapter, we will go into details about what exactly text analysis is, and the motivations for studying and understanding text analysis.

⦁ Chapter 2, Python Tips for Text Analysis. We mentioned in Chapter 1, What is Text Analysis, that we will be using Python throughout the book because it is an easy-to-use and powerful language. In this chapter, we will substantiate these claims, while also providing a revision course in basic Python for text analysis. Why is this important? While we expect readers of the book to have a background in Python and high-school math, it is still possible that it’s been a while since you’ve written Python code - and even if you have, Python code you write during text analysis and string manipulation is quite different from, say, building a website using the web framework Django.

⦁ Chapter 3, spaCy’s Language Models. While we introduced text analysis in the previous chapter, we did not discuss any of the technical details behind building a text analysis pipeline. In this chapter, we will introduce you to spaCy’s Language Model - these will serve as the first step in text analysis, and are the first building block in our pipelines. Also, we will introduce the reader to spaCy and how we can use spaCy to help us in our text analysis tasks, as well as talk about some of it’s more powerful functionalities, such as POStagging and NER-tagging. We will finish up with an example of how we can preprocess data quickly and efficiently using spaCy.

⦁ Chapter 4, Gensim –Vectorizing Text and Transformations and n-grams. While we have worked with raw textual data so far, any Machine Learning or information retrieval related algorithm will not accept data like this - which is why we use mathematical constructs called Vectors to help let the algorithms make sense of the text. We will introduce gensim as the tool to conduct this transformation, as well as scikit-learn, which will be used before we plug in the text to any sort of further analysis. A huge part of preprocessing is carried on over when we start our vectorization - bi-grams, tri-grams, and n-grams, as well using term frequencies to get rid of some words which we deem to not be useful.

⦁ Chapter 5, POS-Tagging and Its Applications. Chapters 1 and 2 introduced text analysis and Python, and chapters 3 and 4 helped us set-up our code for more advanced text analysis. This chapter discusses the first of such advanced techniques - part of speech tagging, popularly called POS-tagging. We will study what parts of speech exist, how to identify them in our documents, and what possible uses these POS-tags have.

⦁ Chapter 6, NER-Tagging and Its Applications. In the previous chapter, we saw how we can use spaCy’s language pipeline - POS-tagging is a very powerful tool, and we will now explore itsanother interesting usage, NER-tagging. We will discuss what exactly this is from a both linguistic and text analysis point of view, as well as detailing examples of its usage, and how to train our own NER-tagger with spaCy.

⦁ Chapter 7, Dependency Parsing. We saw in Chapters 5 and 6 how spaCy’s language pipeline performs a variety of complex Computational Linguistics algorithms, such as POS-tagging and NER-tagging. This isn’t all spaCy packs though, and in this chapter we will explore the power of dependency parsing and how it can be used in a variety of contexts and applications. We will have a look at the theory of dependency parsing before moving on to using it with spaCy, as well as training our own dependency parsers.

⦁ Chapter 8, Topic Models. Until now, we dealt with Computational Linguistics algorithms and spaCy, and understood how to use these computational linguistic algorithms to annotate our data, as well as understand sentence structure. While these algorithms helped us understand the finer details of our text, we still didn’t get a big picture of our data - what kind of words appear more often than others in our corpus? Can we group our data or find underlying themes? We will be attempting to answer these questions and more in this chapter.

⦁ Chapter 9, Advanced Topic Modeling. We saw in the previous chapter the power of topic modeling, and how intuitive a way it can be to understand our data, as well as explore it. In this chapter, we will further explore the utility of these topic models, and also on how to create more useful topic models which better encapsulate the topics that may be present in a corpus. Since topic modeling is a way to understand the documents of a corpus, it also means that we can analyze documents in ways we have not done before.

⦁ Chapter 10, Clustering and Classifying Text. In the previous chapter we studied topic models and how they can help us in organizing and better understanding our documents and its sub-structure. We will now move on to our next set of Machine Learning algorithms, and for two particular tasks - clustering and classification. We will learn what is the intuitive reasoning of these two tasks, as well as how to perform these tasks using the popular Python Machine Learning library, scikit-learn.

⦁ Chapter 11, Similarity Queries and Summarization. Once we have begun to represent text documents in the form of vector representations, it is possible to start finding the similarity or distance between documents - and that is exactly what we will learn about in this chapter. We are now aware of a variety of different vector representations, from standard bag-of-words or TF-IDF to topic model representations of text documents. We will also learn about a very useful feature implemented in gensim and how to use it - summarization and keyword extraction.

⦁ Chapter 12, Word2Vec, Doc2Vec and Gensim. We previously talked about vectors a lot throughout the book - they are used to understand and represent our textual data in a mathematical form, and the basis of all the Machine Learning methods we use rely on these representations. We will be taking this one step further, and use Machine Learning techniques to generate vector representations of words which better encapsulate the meaning of a word. This technique is generally referred to as word embeddings, and Word2Vec and Doc2Vec are two popular variations of these.

⦁ Chapter 13, Deep Learning for Text. Until now, we have explored the usage of Machine Learning for text in a variety of contexts - topic modelling, clustering, classification, text summarisation, and even our POS-taggers and NER-taggers were trained using Machine Learning. In this chapter, we will begin to explore one of the most cutting-edge forms of Machine Learning - Deep Learning. Deep Learning is a form of ML where we use biologically inspired structures to generate algorithms and architectures to perform various tasks on text. Some of these tasks are text generation, classification, and word embeddings. In this chapter, we will discuss some of the underpinnings of Deep Learning as well as how to implement our own Deep Learning models for text.

⦁ Chapter 14, Keras and spaCy for Deep Learning. In the previous chapter, we introduced Deep Learning techniques for text, and to get a taste of using Neural Networks, we attempted to generate text using an RNN. In this chapter, we will take a closer look at Deep Learning for text, and in particular, how to set up a Keras model which can perform classification, as well as how to incorporate Deep Learning into spaCy pipelines.

⦁ Chapter 15, Sentiment Analysis and ChatBots. By now, we are equipped with the skills needed to get started on text analysis projects, and to also take a shot at more complicated, meatier projects. Two common text analysis projects which encapsulate a lot of the concepts we have explored throughout the book are sentiment analysis and chatbots. In fact, we’ve already touched upon all the methods we will be using for these projects, and this chapter will serve as a guide to how one can put up such an application on their own. In this chapter, we will not be providing the code to build a chatbot or sentiment analysis pipeline from the first step to the last, but will rather introduce the reader to a variety of techniques that will help when setting up such a project.

작가 소개

⦁ Bhargav Srinivasa-Desikan
Bhargav Srinivasa-Desikan is a research engineer working for INRIA in Lille, France. He is a part of the MODAL (Models of Data Analysis and Learning) team, and has a deep interest in modern text analysis. He works on metric learning, predictor aggregation, and data visualization. He is a regular contributor to the Python open source community, and completed Google Summer of Code in 2016 with Gensim where he implemented Dynamic Topic Models. He is a regular speaker at PyCons and PyDatas across Europe and Asia, and conducts tutorials on text analysis using Python.

리뷰

0.0

구매자 별점
0명 평가

이 작품을 평가해 주세요!

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.
  1. 타인에게 불쾌감을 주는 욕설
  2. 비속어나 타인을 비방하는 내용
  3. 특정 종교, 민족, 계층을 비방하는 내용
  4. 해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
  5. 의미를 알 수 없는 내용
  6. 광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
  7. 저작권상 문제의 소지가 있는 내용
  8. 다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용
* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.
이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.
아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!
'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.
무료 작품 (프로모션 등으로 무료로 전환된 작품 포함)
'구매자'로 표시되지 않습니다.
시리즈 내 무료 작품
'구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제
작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소
'구매자' 표시가 자동으로 사라집니다.

개발/프로그래밍 베스트더보기

  • 밑바닥부터 만들면서 배우는 LLM (세바스찬 라시카, 박해선)
  • AI 엔지니어링 (칩 후옌, 변성윤)
  • 밑바닥부터 시작하는 웹 브라우저 (파벨 판체카, 크리스 해럴슨)
  • 한 걸음 앞선 개발자가 지금 꼭 알아야 할 클로드 코드 (조훈, 정찬훈)
  • 헤드 퍼스트 소프트웨어 아키텍처 (라주 간디, 마크 리처드)
  • 테디노트의 랭체인을 활용한 RAG 비법노트 심화편 (이경록)
  • AI 프로덕트 기획과 운영 (마릴리 니카, 오성근)
  • 안티프래질 프런트엔드 (김상철)
  • AI 에이전트 생태계 (이주환)
  • 요즘 바이브 코딩 클로드 코드 완벽 가이드 (최지호(코드팩토리))
  • 블렌더로 애니 그림체 캐릭터를 만들어보자! -모델링편- (나츠모리 카츠, 김모세)
  • 실무로 통하는 웹 API (조 아타디, 김태곤)
  • 개정판 | C언어 이해 (김기용, 성동수)
  • 개정판 | Do it! 점프 투 파이썬 (박응용)
  • 요즘 바이브 코딩 커서 AI 30가지 프로그램 만들기 (박현규)
  • 할루시네이션을 줄여주는 프롬프트 엔지니어링 (한성민 )
  • 블렌더로 애니 그림체 캐릭터를 만들어보자! 카툰 렌더링편 (나츠모리 카츠, 김모세)
  • 소문난 명강의 : 크리핵티브의 한 권으로 끝내는 웹 해킹 바이블 (하동민)
  • 네이처 오브 코드 (자바스크립트판) (다니엘 쉬프만, 윤인성)
  • 테디노트의 랭체인을 활용한 RAG 비법노트_기본편 (이경록(테디노트))

본문 끝 최상단으로 돌아가기

spinner
앱으로 연결해서 다운로드하시겠습니까?
닫기 버튼
대여한 작품은 다운로드 시점부터 대여가 시작됩니다.
앱으로 연결해서 보시겠습니까?
닫기 버튼
앱이 설치되어 있지 않으면 앱 다운로드로 자동 연결됩니다.
모바일 버전