NLTK环境搭建

NLTK ( Natural Language Toolkit ) 包是Python中常用的自然语言处理工具包。下面我介绍一下如何在搭建NLTK环境。

首先到 http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-dateutil （收录了基本所有的python扩展包）找到NLTK、PyYAML、NumPy和Matplotlib，下载并安装
进入python命令行输入
```
import nltk  
nltk.download()
```
选择all，设置好下载路径（Download Directory），然后点击Download，系统就开始下载NLTK的数据包了，下载的时间比较漫长，大家要耐心等待。如果有个别数据包无法下载，你可以切换到All Packages标签页，双击指定的包来进行下载：

如果都不行的话，你还可以直接到 http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml 去下载数据包，只要将数据包复制到你的Download Directory目录下即可。这里，我将nltk_data文件夹复制到了Python27的安装路径下。

安装NLTK
```
from nltk.book import *
```

键入以上代码可以得到以下显示，就说明NLTK数据包都安装好了～


*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9  
Type the name of the text or sentence to view it.  
Type: 'texts()' or 'sents()' to list the materials.  
text1: Moby Dick by Herman Melville 1851  
text2: Sense and Sensibility by Jane Austen 1811  
text3: The Book of Genesis  
text4: Inaugural Address Corpus  
text5: Chat Corpus  
text6: Monty Python and the Holy Grail  
text7: Wall Street Journal  
text8: Personals Corpus  
text9: The Man Who Was Thursday by G . K . Chesterton 1908

实践

使用NLTK进行字符串查询


text1.concordance('monstrous')

说明：


text1为NLTK数据包中的一段数据源，是一大串字符串。（原文在数据包下载目录下的gutenberg.zip中的melville-moby_dick.txt）  
text1.concordance('monstrous') 这句话实现的是从这一大串字符串中找寻出包含monstrous这个单词的语句。

备注：


NLTK：自然语言处理工具包  
PyYAML：YAML的解析工具  
NumPy：支持多维数组和线性代数  
Matplotlib：用于数据可视化的二维图库

转载请注明：宁哥的小站 » NLTK环境搭建

宁哥的小站专注数据挖掘、机器学习方向。

您必须登录才能发表评论！

您必须 登录 才能发表评论！

您必须登录才能发表评论！