You are on page 1of 101

Getting Started with NLTK

An Introduction to NLTK
Sreejith S
srssreejith@gmail.com
@tweet2sree
FOSSMeet 2011,NIC Calicut
06 February 2011
Sreejith S Getting Started with NLTK
Just a word about me !!
Working in Natural Language Processing (NLP), Machine Learning,
Text Mining
Active member of ilugcbe , http://ilugcbe.techstud.org
Works for 365Media Pvt. Ltd. Coimbatore India.
@tweet2sree , srssreejith@gmail.com
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Introduction - NLP
Natural Language Processing
NLP is an inter-disciplinary subject
Computer Science
Linguistics
Statistics etc...
NLP is a sub eld of Articial Intelligence
NLP - Any kind of computer manipulation of natural language.
It is a rapidly developing eld of study
Everyday applications of NLP
Handwriting recognition,Machine translation,Question-answering
systems,Spell checkers,Grammer checkers etc...
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
Natural Language Toolkit (NLTK)
A collection of Python programs, modules, data set and tutorial to
support research and development in Natural Language Processing
(NLP)
Written by Steven Bird, Edvard Loper and Ewan Klien
NLTK is
Free and Open source
Easy to use
Modular
Well documented
Simple and extensible
http://www.nltk.org
Sreejith S Getting Started with NLTK
What You Will Learn
How simple programs can help you manipulate and analyze language
data, and how to write these programs
How key concepts from NLP and linguistics are used to describe and
analyze language
How data structures and algorithms are used in NLP
How language data is stored in standard formats, and how data can
be used to evaluate the performance of NLP techniques
Sreejith S Getting Started with NLTK
What You Will Learn
How simple programs can help you manipulate and analyze language
data, and how to write these programs
How key concepts from NLP and linguistics are used to describe and
analyze language
How data structures and algorithms are used in NLP
How language data is stored in standard formats, and how data can
be used to evaluate the performance of NLP techniques
Sreejith S Getting Started with NLTK
What You Will Learn
How simple programs can help you manipulate and analyze language
data, and how to write these programs
How key concepts from NLP and linguistics are used to describe and
analyze language
How data structures and algorithms are used in NLP
How language data is stored in standard formats, and how data can
be used to evaluate the performance of NLP techniques
Sreejith S Getting Started with NLTK
What You Will Learn
How simple programs can help you manipulate and analyze language
data, and how to write these programs
How key concepts from NLP and linguistics are used to describe and
analyze language
How data structures and algorithms are used in NLP
How language data is stored in standard formats, and how data can
be used to evaluate the performance of NLP techniques
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
Installation of NLTK
Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
Install Python Tkinter package
Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
Download NLTK and Install it
If you are installing NLTK from source Download
http://nltk.googlecode.com/les/nltk-2.0b9.zip
Unzip it , It will create nltk-2.0b9 .
Open terminal and cd in to this folder, Be super user , python
setup.py install
To install data
Start python interpreter
>>> import nltk
>>> nltk.download()
Now you are ready to play with NLTK !!!
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
NLTK Modules
NLTK Modules Functionality
nltk.corpus Courpus
nltk.tokenize,nltk.stem Tokenizers,stemmers
nltk.collocations t-test,chi-squared,mutual-info
nltk.tag n-gram,backoff,Brill,HMM,TnT
nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means
nltk.chunk Regex,n-gram,named entity
nltk.parsing Parsing
nltk.sem,nltk.interence Semantic interpretation
nltk.metrics Evaluation metrics
nltk.probability Probability & Estimation
nltk.app,nltk.chat Applications
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Let us start the game
To access data for working out the example in the book
Start python interpreter
Some basic work outs from the book
Concordance
>>> from nltk.book import *
>>> text1.concordance("monstrous")
Similar
>>> text1.similar("monstrous")
Dispersion plot - Positional information
>>> text4.dispersion_plot(["citizens",
"democracy", "freedom", "duties", "America"])
>>> text4.dispersion_plot(["and",
"to", "of", "with", "the"])
What is it !!! Why ???
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Continued...
Some basic work outs from the book
Generate
>>> text3.generate()
Counting Vocabulary
>>> len(text3)
List of distinct words ,sorted in dictionary order.
>>> sorted(set(text3))
Count occurrence of a particular word in a text
>>> text3.count("and")
What percentage of text it is taken by a specific word
>>> 100 * text3.count("and") / len(text3)
Sreejith S Getting Started with NLTK
Collocation & Bigram
Collocation
A collocation is a sequence of words that occur together unusually often
e.g :- red wine , strong tea
But strong computer is not a collocation
>>> text4.collocations()
Bigrams
List of word pairs
>>> text = "sreejith is talking about NLTK"
>>> wordlist = text.split()
>>> bigrams(wordlist)
what will happen if i do like this
>>> bigrams(text)
Sreejith S Getting Started with NLTK
Collocation & Bigram
Collocation
A collocation is a sequence of words that occur together unusually often
e.g :- red wine , strong tea
But strong computer is not a collocation
>>> text4.collocations()
Bigrams
List of word pairs
>>> text = "sreejith is talking about NLTK"
>>> wordlist = text.split()
>>> bigrams(wordlist)
what will happen if i do like this
>>> bigrams(text)
Sreejith S Getting Started with NLTK
Collocation & Bigram
Collocation
A collocation is a sequence of words that occur together unusually often
e.g :- red wine , strong tea
But strong computer is not a collocation
>>> text4.collocations()
Bigrams
List of word pairs
>>> text = "sreejith is talking about NLTK"
>>> wordlist = text.split()
>>> bigrams(wordlist)
what will happen if i do like this
>>> bigrams(text)
Sreejith S Getting Started with NLTK
Collocation & Bigram
Collocation
A collocation is a sequence of words that occur together unusually often
e.g :- red wine , strong tea
But strong computer is not a collocation
>>> text4.collocations()
Bigrams
List of word pairs
>>> text = "sreejith is talking about NLTK"
>>> wordlist = text.split()
>>> bigrams(wordlist)
what will happen if i do like this
>>> bigrams(text)
Sreejith S Getting Started with NLTK
Collocation & Bigram
Collocation
A collocation is a sequence of words that occur together unusually often
e.g :- red wine , strong tea
But strong computer is not a collocation
>>> text4.collocations()
Bigrams
List of word pairs
>>> text = "sreejith is talking about NLTK"
>>> wordlist = text.split()
>>> bigrams(wordlist)
what will happen if i do like this
>>> bigrams(text)
Sreejith S Getting Started with NLTK
Collocation & Bigram
Collocation
A collocation is a sequence of words that occur together unusually often
e.g :- red wine , strong tea
But strong computer is not a collocation
>>> text4.collocations()
Bigrams
List of word pairs
>>> text = "sreejith is talking about NLTK"
>>> wordlist = text.split()
>>> bigrams(wordlist)
what will happen if i do like this
>>> bigrams(text)
Sreejith S Getting Started with NLTK
Work with our own data
Populate our own corpora with NLTK and analyse it
>>> from nltk.corpus import
PlaintextCorpusReader as ptr
>>> corpus = /home/developer/Desktop/Sreejith
>>> wordlist = ptr(corpus,.*)
>>> wordlist.fileids()
Let us try to nd it out how to count number of characters, words
and sentences in the corpus
>>> for fid in wordlist.fileids():
print len(wordlist.raw(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.words(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.sents(fid))
Sreejith S Getting Started with NLTK
Work with our own data
Populate our own corpora with NLTK and analyse it
>>> from nltk.corpus import
PlaintextCorpusReader as ptr
>>> corpus = /home/developer/Desktop/Sreejith
>>> wordlist = ptr(corpus,.*)
>>> wordlist.fileids()
Let us try to nd it out how to count number of characters, words
and sentences in the corpus
>>> for fid in wordlist.fileids():
print len(wordlist.raw(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.words(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.sents(fid))
Sreejith S Getting Started with NLTK
Work with our own data
Populate our own corpora with NLTK and analyse it
>>> from nltk.corpus import
PlaintextCorpusReader as ptr
>>> corpus = /home/developer/Desktop/Sreejith
>>> wordlist = ptr(corpus,.*)
>>> wordlist.fileids()
Let us try to nd it out how to count number of characters, words
and sentences in the corpus
>>> for fid in wordlist.fileids():
print len(wordlist.raw(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.words(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.sents(fid))
Sreejith S Getting Started with NLTK
Work with our own data
Populate our own corpora with NLTK and analyse it
>>> from nltk.corpus import
PlaintextCorpusReader as ptr
>>> corpus = /home/developer/Desktop/Sreejith
>>> wordlist = ptr(corpus,.*)
>>> wordlist.fileids()
Let us try to nd it out how to count number of characters, words
and sentences in the corpus
>>> for fid in wordlist.fileids():
print len(wordlist.raw(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.words(fid))
>>> for fid in wordlist.fileids():
print len(wordlist.sents(fid))
Sreejith S Getting Started with NLTK
Continued...
Ploting conditional frquency distribution
>>> text = "sreejith is talking about NLTK"
>>> words = text.split()
>>> big = bigrams(words)
>>> gd = nltk.ConditionalFreqDist(big)
>>> gd.plot()
Tabulate CFD
>>> gd.tabulate()
Plot frequency distribution
>>> fdist = FreqDist(text1)
>>> fdist.plot(50,cumulative=True)
Sreejith S Getting Started with NLTK
Continued...
Ploting conditional frquency distribution
>>> text = "sreejith is talking about NLTK"
>>> words = text.split()
>>> big = bigrams(words)
>>> gd = nltk.ConditionalFreqDist(big)
>>> gd.plot()
Tabulate CFD
>>> gd.tabulate()
Plot frequency distribution
>>> fdist = FreqDist(text1)
>>> fdist.plot(50,cumulative=True)
Sreejith S Getting Started with NLTK
Continued...
Ploting conditional frquency distribution
>>> text = "sreejith is talking about NLTK"
>>> words = text.split()
>>> big = bigrams(words)
>>> gd = nltk.ConditionalFreqDist(big)
>>> gd.plot()
Tabulate CFD
>>> gd.tabulate()
Plot frequency distribution
>>> fdist = FreqDist(text1)
>>> fdist.plot(50,cumulative=True)
Sreejith S Getting Started with NLTK
Continued...
Ploting conditional frquency distribution
>>> text = "sreejith is talking about NLTK"
>>> words = text.split()
>>> big = bigrams(words)
>>> gd = nltk.ConditionalFreqDist(big)
>>> gd.plot()
Tabulate CFD
>>> gd.tabulate()
Plot frequency distribution
>>> fdist = FreqDist(text1)
>>> fdist.plot(50,cumulative=True)
Sreejith S Getting Started with NLTK
Continued...
Ploting conditional frquency distribution
>>> text = "sreejith is talking about NLTK"
>>> words = text.split()
>>> big = bigrams(words)
>>> gd = nltk.ConditionalFreqDist(big)
>>> gd.plot()
Tabulate CFD
>>> gd.tabulate()
Plot frequency distribution
>>> fdist = FreqDist(text1)
>>> fdist.plot(50,cumulative=True)
Sreejith S Getting Started with NLTK
Continued...
Ploting conditional frquency distribution
>>> text = "sreejith is talking about NLTK"
>>> words = text.split()
>>> big = bigrams(words)
>>> gd = nltk.ConditionalFreqDist(big)
>>> gd.plot()
Tabulate CFD
>>> gd.tabulate()
Plot frequency distribution
>>> fdist = FreqDist(text1)
>>> fdist.plot(50,cumulative=True)
Sreejith S Getting Started with NLTK
Normalizing Text
Stemming
Stemming is the process for reducing inected (or sometimes derived)
words to their stem, base or root form , generally a written word form
>>> porter = nltk.PorterStemmer()
>>> word = running
>>> porter.stem(word)
>>> lancaster = nltk.LancasterStemmer()
>>> lancaster.stem(tok[2])
Sreejith S Getting Started with NLTK
Normalizing Text
Stemming
Stemming is the process for reducing inected (or sometimes derived)
words to their stem, base or root form , generally a written word form
>>> porter = nltk.PorterStemmer()
>>> word = running
>>> porter.stem(word)
>>> lancaster = nltk.LancasterStemmer()
>>> lancaster.stem(tok[2])
Sreejith S Getting Started with NLTK
Normalizing Text
Stemming
Stemming is the process for reducing inected (or sometimes derived)
words to their stem, base or root form , generally a written word form
>>> porter = nltk.PorterStemmer()
>>> word = running
>>> porter.stem(word)
>>> lancaster = nltk.LancasterStemmer()
>>> lancaster.stem(tok[2])
Sreejith S Getting Started with NLTK
Normalizing Text
Lemmatization
Stemming + make sure that the resulting form is a known word in a
dictionary
>>> wnl = nltk.WordNetLemmatizer()
>>> wnl.lemmatize(word)
Sreejith S Getting Started with NLTK
Normalizing Text
Lemmatization
Stemming + make sure that the resulting form is a known word in a
dictionary
>>> wnl = nltk.WordNetLemmatizer()
>>> wnl.lemmatize(word)
Sreejith S Getting Started with NLTK
Normalizing Text
Lemmatization
Stemming + make sure that the resulting form is a known word in a
dictionary
>>> wnl = nltk.WordNetLemmatizer()
>>> wnl.lemmatize(word)
Sreejith S Getting Started with NLTK
POS Tagging
POS Tagging
The process of classifying words into their parts-of-speech and labeling
them accordingly is known as part-of-speech tagging, POS tagging
>>> text = nltk.word_tokenize("we are attending
FOSS meet at NIC calicut")
>>> nltk.pos_tag(text)
Sreejith S Getting Started with NLTK
POS Tagging
POS Tagging
The process of classifying words into their parts-of-speech and labeling
them accordingly is known as part-of-speech tagging, POS tagging
>>> text = nltk.word_tokenize("we are attending
FOSS meet at NIC calicut")
>>> nltk.pos_tag(text)
Sreejith S Getting Started with NLTK
POS Tagging
POS Tagging
The process of classifying words into their parts-of-speech and labeling
them accordingly is known as part-of-speech tagging, POS tagging
>>> text = nltk.word_tokenize("we are attending
FOSS meet at NIC calicut")
>>> nltk.pos_tag(text)
Sreejith S Getting Started with NLTK
Parsing
Sentence Parsing
Analyzing sentence structures and create a Parse Tree
>>> sentence = [("the", "DT"), ("little", "JJ"),
("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
("at", "IN"), ("the", "DT"), ("cat", "NN")]
>>> grammar = "NP: {<DT>?<JJ>*<NN>}"
>>> cp = nltk.RegexpParser(grammar)
>>> result = cp.parse(sentence)
>>> print result
>>> result.draw()
Sreejith S Getting Started with NLTK
Parsing
Sentence Parsing
Analyzing sentence structures and create a Parse Tree
>>> sentence = [("the", "DT"), ("little", "JJ"),
("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
("at", "IN"), ("the", "DT"), ("cat", "NN")]
>>> grammar = "NP: {<DT>?<JJ>*<NN>}"
>>> cp = nltk.RegexpParser(grammar)
>>> result = cp.parse(sentence)
>>> print result
>>> result.draw()
Sreejith S Getting Started with NLTK
Parsing
Sentence Parsing
Analyzing sentence structures and create a Parse Tree
>>> sentence = [("the", "DT"), ("little", "JJ"),
("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
("at", "IN"), ("the", "DT"), ("cat", "NN")]
>>> grammar = "NP: {<DT>?<JJ>*<NN>}"
>>> cp = nltk.RegexpParser(grammar)
>>> result = cp.parse(sentence)
>>> print result
>>> result.draw()
Sreejith S Getting Started with NLTK
Machine Translation
Babelizer Shell
Translating a sentence from its source langauge to a specied language.
NLTK provides babelize shell
>>> babelize_shell()
Babel> hello how are you?
Babel> german
Babel> run
Just try Google Translator, Yahoo babelsh
Sreejith S Getting Started with NLTK
Machine Translation
Babelizer Shell
Translating a sentence from its source langauge to a specied language.
NLTK provides babelize shell
>>> babelize_shell()
Babel> hello how are you?
Babel> german
Babel> run
Just try Google Translator, Yahoo babelsh
Sreejith S Getting Started with NLTK
Machine Translation
Babelizer Shell
Translating a sentence from its source langauge to a specied language.
NLTK provides babelize shell
>>> babelize_shell()
Babel> hello how are you?
Babel> german
Babel> run
Just try Google Translator, Yahoo babelsh
Sreejith S Getting Started with NLTK
Machine Translation
Babelizer Shell
Translating a sentence from its source langauge to a specied language.
NLTK provides babelize shell
>>> babelize_shell()
Babel> hello how are you?
Babel> german
Babel> run
Just try Google Translator, Yahoo babelsh
Sreejith S Getting Started with NLTK
What u can do??
Contribute to NLTK
GSOC
NLP Training
Real time research
Sreejith S Getting Started with NLTK
Reference
Steven Bird, Edvard Loper and Ewan Klien
Natural Language Processing with Python
Jacob Perkins
Python Text Processing with NLTK2.0 Cookbook
http://www.nltk.org
Sreejith S Getting Started with NLTK
Questions
Sreejith S Getting Started with NLTK
And nally...
Sreejith.S
Sreejith S Getting Started with NLTK

You might also like