如何让TextBlob与Ubuntu上的所有用户一起使用?

时间:2017-05-09 07:58:31

标签: python ubuntu-12.04

我试图让UnixB服务器上的某些团队成员运行TextBlob,当我以root身份运行时运行使用TextBlob的脚本时似乎工作正常,但是当我尝试我创建的新帐户出现以下错误:

**********************************************************************
  Resource u'tokenizers/punkt/english.pickle' not found.  Please
  use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - '/home/USERNAME/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - u''
**********************************************************************
Traceback (most recent call last):
  File "sampleClassifier.py", line 25, in <module>
    cl = NaiveBayesClassifier(train)
  File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 192, in __init__
    self.train_features = [(self.extract_features(d), c) for d, c in self.train_set]
  File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 169, in extract_features
    return self.feature_extractor(text, self.train_set)
  File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 81, in basic_extractor
    word_features = _get_words_from_dataset(train_set)
  File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 63, in _get_words_from_dataset
    return set(all_words)
  File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 62, in <genexpr>
    all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
  File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 59, in tokenize
    return word_tokenize(words, include_punc=False)
  File "/usr/local/lib/python2.7/dist-packages/textblob/tokenizers.py", line 72, in word_tokenize
    for sentence in sent_tokenize(text))
  File "/usr/local/lib/python2.7/dist-packages/textblob/base.py", line 64, in itokenize
    return (t for t in self.tokenize(text, *args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/textblob/decorators.py", line 38, in decorated
    raise MissingCorpusError()
textblob.exceptions.MissingCorpusError:
Looks like you are missing some required data for this feature.

To download the necessary data, simply run

    python -m textblob.download_corpora

or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.

我们正在使用的机器非常小,所以我不能通过为不同用户多次下载语料库来压倒它 - 有谁知道我如何解决这个问题?我已经为root安装了它,但我不知道软件包的位置或者如何找到它们。

1 个答案:

答案 0 :(得分:0)

按照docs中的说明进行操作。尝试设置NLTK_DATA环境变量,看看它是否适用于新用户。