NLTK TweetTokenizer无法正常工作(Python)

时间:2016-11-30 00:29:38

标签: python nltk

我目前安装了NLTK并运行了命令nltk.download()。然而,并非所有库都已安装(它会卡在panlex_lite上)。

问题是,当我尝试导入Tweet Tokenizer时,我收到错误:

  

文件" create_docs.py",第7行,

from nltk.tokenize import TweetTokenizer ImportError: cannot import 
     

名称TweetTokenizer

我该如何处理?干杯!

1 个答案:

答案 0 :(得分:0)

这是因为未正确安装库,因此需要跳过" panlex_lite" 库并且应该可以正常工作。

  

目前尚未解决此问题,解决方案如下:

I guess, we could add something like if id != 'panlex_lite' to the code...

But, as for me, the easiest way looks like this:

get https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
remove panlex from it
upload it to a public Gist
pass the gist's url to the downloader: python -m nltk.downloader -d /usr/local/share/nltk_data -u https://gist.githubusercontent.com/demidovakatya/61dab385d74065ae825c80496a197980/raw/c6ff7fbf44265c7f8c9e961e3e1158cd812d6af1/index.xml all

此处是发出问题的链接:look at last 2 conversations