Name: Natural Language Processing and Computational Linguistics
Price: 19000 KRW
Availability: OnlineOnly
Author: Bhargav Srinivasa-Desikan

Natural Language Processing and Computational Linguistics 상세페이지

출간 정보

2018.06.29 전자책 출간

듣기 기능

TTS(듣기) 미지원

파일 정보

PDF
298 쪽
7.7MB

지원 환경

앱
웹
PC뷰어
PAPER

ISBN

9781788837033

UCI

Natural Language Processing and Computational Linguistics

작품 정보

▶Book Description
Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data.

This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.

You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning.

This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

▶What You Will Learn
⦁ Why text analysis is important in our modern age
⦁ Understand NLP terminology and get to know the Python tools and datasets
⦁ Learn how to pre-process and clean textual data
⦁ Convert textual data into vector space representations
⦁ Using spaCy to process text
⦁ Train your own NLP models for computational linguistics
⦁ Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
⦁ Employ deep learning techniques for text analysis using Keras

▶Key Features
⦁ Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
⦁ Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
⦁ Learn deep learning techniques for text analysis

▶Who This Book Is For
Fluency in Python is assumed, but the book attempts to be accessible to even Python beginners. Basic statistics is helpful. Given that this book introduces Natural Language Processing from first principles, it helps, although it is not a requirement, to be familiar with basic linguistics.

▶What this book covers
⦁ Chapter 1, What is Text Analysis? There is no time like now to do text analysis - we have an abundance of easily available data, powerful and free open source tools to conduct our analysis and research on Machine Learning, Computational Linguistics, and computing with text is progressing at a pace we have not seen before. In this chapter, we will go into details about what exactly text analysis is, and the motivations for studying and understanding text analysis.

⦁ Chapter 2, Python Tips for Text Analysis. We mentioned in Chapter 1, What is Text Analysis, that we will be using Python throughout the book because it is an easy-to-use and powerful language. In this chapter, we will substantiate these claims, while also providing a revision course in basic Python for text analysis. Why is this important? While we expect readers of the book to have a background in Python and high-school math, it is still possible that it’s been a while since you’ve written Python code - and even if you have, Python code you write during text analysis and string manipulation is quite different from, say, building a website using the web framework Django.

⦁ Chapter 3, spaCy’s Language Models. While we introduced text analysis in the previous chapter, we did not discuss any of the technical details behind building a text analysis pipeline. In this chapter, we will introduce you to spaCy’s Language Model - these will serve as the first step in text analysis, and are the first building block in our pipelines. Also, we will introduce the reader to spaCy and how we can use spaCy to help us in our text analysis tasks, as well as talk about some of it’s more powerful functionalities, such as POStagging and NER-tagging. We will finish up with an example of how we can preprocess data quickly and efficiently using spaCy.

⦁ Chapter 4, Gensim –Vectorizing Text and Transformations and n-grams. While we have worked with raw textual data so far, any Machine Learning or information retrieval related algorithm will not accept data like this - which is why we use mathematical constructs called Vectors to help let the algorithms make sense of the text. We will introduce gensim as the tool to conduct this transformation, as well as scikit-learn, which will be used before we plug in the text to any sort of further analysis. A huge part of preprocessing is carried on over when we start our vectorization - bi-grams, tri-grams, and n-grams, as well using term frequencies to get rid of some words which we deem to not be useful.

⦁ Chapter 5, POS-Tagging and Its Applications. Chapters 1 and 2 introduced text analysis and Python, and chapters 3 and 4 helped us set-up our code for more advanced text analysis. This chapter discusses the first of such advanced techniques - part of speech tagging, popularly called POS-tagging. We will study what parts of speech exist, how to identify them in our documents, and what possible uses these POS-tags have.

⦁ Chapter 6, NER-Tagging and Its Applications. In the previous chapter, we saw how we can use spaCy’s language pipeline - POS-tagging is a very powerful tool, and we will now explore itsanother interesting usage, NER-tagging. We will discuss what exactly this is from a both linguistic and text analysis point of view, as well as detailing examples of its usage, and how to train our own NER-tagger with spaCy.

⦁ Chapter 7, Dependency Parsing. We saw in Chapters 5 and 6 how spaCy’s language pipeline performs a variety of complex Computational Linguistics algorithms, such as POS-tagging and NER-tagging. This isn’t all spaCy packs though, and in this chapter we will explore the power of dependency parsing and how it can be used in a variety of contexts and applications. We will have a look at the theory of dependency parsing before moving on to using it with spaCy, as well as training our own dependency parsers.

⦁ Chapter 8, Topic Models. Until now, we dealt with Computational Linguistics algorithms and spaCy, and understood how to use these computational linguistic algorithms to annotate our data, as well as understand sentence structure. While these algorithms helped us understand the finer details of our text, we still didn’t get a big picture of our data - what kind of words appear more often than others in our corpus? Can we group our data or find underlying themes? We will be attempting to answer these questions and more in this chapter.

⦁ Chapter 9, Advanced Topic Modeling. We saw in the previous chapter the power of topic modeling, and how intuitive a way it can be to understand our data, as well as explore it. In this chapter, we will further explore the utility of these topic models, and also on how to create more useful topic models which better encapsulate the topics that may be present in a corpus. Since topic modeling is a way to understand the documents of a corpus, it also means that we can analyze documents in ways we have not done before.

⦁ Chapter 10, Clustering and Classifying Text. In the previous chapter we studied topic models and how they can help us in organizing and better understanding our documents and its sub-structure. We will now move on to our next set of Machine Learning algorithms, and for two particular tasks - clustering and classification. We will learn what is the intuitive reasoning of these two tasks, as well as how to perform these tasks using the popular Python Machine Learning library, scikit-learn.

⦁ Chapter 11, Similarity Queries and Summarization. Once we have begun to represent text documents in the form of vector representations, it is possible to start finding the similarity or distance between documents - and that is exactly what we will learn about in this chapter. We are now aware of a variety of different vector representations, from standard bag-of-words or TF-IDF to topic model representations of text documents. We will also learn about a very useful feature implemented in gensim and how to use it - summarization and keyword extraction.

⦁ Chapter 12, Word2Vec, Doc2Vec and Gensim. We previously talked about vectors a lot throughout the book - they are used to understand and represent our textual data in a mathematical form, and the basis of all the Machine Learning methods we use rely on these representations. We will be taking this one step further, and use Machine Learning techniques to generate vector representations of words which better encapsulate the meaning of a word. This technique is generally referred to as word embeddings, and Word2Vec and Doc2Vec are two popular variations of these.

⦁ Chapter 13, Deep Learning for Text. Until now, we have explored the usage of Machine Learning for text in a variety of contexts - topic modelling, clustering, classification, text summarisation, and even our POS-taggers and NER-taggers were trained using Machine Learning. In this chapter, we will begin to explore one of the most cutting-edge forms of Machine Learning - Deep Learning. Deep Learning is a form of ML where we use biologically inspired structures to generate algorithms and architectures to perform various tasks on text. Some of these tasks are text generation, classification, and word embeddings. In this chapter, we will discuss some of the underpinnings of Deep Learning as well as how to implement our own Deep Learning models for text.

⦁ Chapter 14, Keras and spaCy for Deep Learning. In the previous chapter, we introduced Deep Learning techniques for text, and to get a taste of using Neural Networks, we attempted to generate text using an RNN. In this chapter, we will take a closer look at Deep Learning for text, and in particular, how to set up a Keras model which can perform classification, as well as how to incorporate Deep Learning into spaCy pipelines.

⦁ Chapter 15, Sentiment Analysis and ChatBots. By now, we are equipped with the skills needed to get started on text analysis projects, and to also take a shot at more complicated, meatier projects. Two common text analysis projects which encapsulate a lot of the concepts we have explored throughout the book are sentiment analysis and chatbots. In fact, we’ve already touched upon all the methods we will be using for these projects, and this chapter will serve as a guide to how one can put up such an application on their own. In this chapter, we will not be providing the code to build a chatbot or sentiment analysis pipeline from the first step to the last, but will rather introduce the reader to a variety of techniques that will help when setting up such a project.

작가 소개

⦁ Bhargav Srinivasa-Desikan
Bhargav Srinivasa-Desikan is a research engineer working for INRIA in Lille, France. He is a part of the MODAL (Models of Data Analysis and Learning) team, and has a deep interest in modern text analysis. He works on metric learning, predictor aggregation, and data visualization. He is a regular contributor to the Python open source community, and completed Google Summer of Code in 2016 with Gensim where he implemented Dynamic Topic Models. He is a regular speaker at PyCons and PyDatas across Europe and Asia, and conducts tutorials on text analysis using Python.

리뷰

0.0

구매자 별점

0명 평가

별점 분포 보기

이 작품을 평가해 주세요!

리뷰 작성 유의사항

건전한 리뷰 정착 및 양질의 리뷰를 위해 아래 해당하는 리뷰는 비공개 조치될 수 있음을 안내드립니다.

타인에게 불쾌감을 주는 욕설
비속어나 타인을 비방하는 내용
특정 종교, 민족, 계층을 비방하는 내용
해당 작품의 줄거리나 리디 서비스 이용과 관련이 없는 내용
의미를 알 수 없는 내용
광고 및 반복적인 글을 게시하여 서비스 품질을 떨어트리는 내용
저작권상 문제의 소지가 있는 내용
다른 리뷰에 대한 반박이나 논쟁을 유발하는 내용

* 결말을 예상할 수 있는 리뷰는 자제하여 주시기 바랍니다.

이 외에도 건전한 리뷰 문화 형성을 위한 운영 목적과 취지에 맞지 않는 내용은 담당자에 의해 리뷰가 비공개 처리가 될 수 있습니다.

아직 등록된 리뷰가 없습니다.
첫 번째 리뷰를 남겨주세요!

구매자 표시 기준은 무엇인가요?

'구매자' 표시는 유료 작품 결제 후 다운로드하거나 리디셀렉트 작품을 다운로드 한 경우에만 표시됩니다.

무료 작품 (프로모션 등으로 무료로 전환된 작품 포함): '구매자'로 표시되지 않습니다.
시리즈 내 무료 작품: '구매자'로 표시되지 않습니다. 하지만 같은 시리즈의 유료 작품을 결제한 뒤 리뷰를 수정하거나 재등록하면 '구매자'로 표시됩니다.
영구 삭제: 작품을 영구 삭제해도 '구매자' 표시는 남아있습니다.
결제 취소: '구매자' 표시가 자동으로 사라집니다.