如何使用TextRank克服以下问题?

时间:2017-06-15 16:46:56

标签: python-3.x nltk

您好我想使用名为textrank的以下包,请参阅以下网址了解详情:

https://github.com/davidadamojr/TextRank

在使用pip3克隆所有依赖项之后,我尝试使用此存储库,如下所示:

textrank extract_summary test

但是我收到以下错误:

MacBook-Pro:TextRank-master $ textrank extract_summary test 
Traceback (most recent call last):
  File "/usr/local/bin/textrank", line 11, in <module>
    load_entry_point('textrank==0.1.0', 'console_scripts', 'textrank')()
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/main.py", line 21, in extract_summary
    summary = textrank.extract_sentences(f.read())
  File "/usr/local/lib/python3.6/site-packages/textrank/__init__.py", line 169, in extract_sentences
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 919, in _open
    return find(path_, path + ['']).open()
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 641, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'tokenizers/punkt/PY3/english.pickle' not found.
  Please use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - '/Users/ad/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

似乎缺少nltk库的文件,所以我尝试了:

MacBook-Pro:TextRank-master adolfocamachogonzalez$ python3
Python 3.6.1 (default, Apr  4 2017, 09:40:21) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

但是我无法获得外部资源,因为我尝试将链接复制并粘贴到浏览器中,但我只是像xml结构一样,如下所示:

<?xml version="1.0"?>
<?xml-stylesheet href="index.xsl" type="text/xsl"?>
<nltk_data>
  <packages>
    <package checksum="d577c2cd0fdae148b36d046b14eb48e6" id="maxent_ne_chunker" languages="English" name="ACE Named Entity Chunker (Maximum entropy)" size="13404747" subdir="chunkers" unzip="1" unzipped_size="23604982" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/chunkers/maxent_ne_chunker.zip" />
    <package author="Australian Broadcasting Commission" checksum="ffb36b67ff24cbf7daaf171c897eb904" id="abc" name="Australian Broadcasting Commission 2006" size="1487851" subdir="corpora" unzip="1" unzipped_size="4054966" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/abc.zip" webpage="http://www.abc.net.au/" />
    <package checksum="ae529a1c5f13d6074f5b0d68d8edb537" contact="Gertjan van Noord" id="alpino" license="Distributed with permission of Gertjan van Noord" name="Alpino Dutch Treebank" size="2797255" subdir="corpora" unzip="1" unzipped_size="21604821" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/alpino.zip" webpage="http://www.let.rug.nl/~vannoord/trees/" />
    <package checksum="d3be36b53ab201372f1cd63ffc75e9a9" copyright="Public Domain (not copyrighted)" id="biocreative_ppi" license="Public Domain" name="BioCreAtIvE (Critical Assessment of Information Extraction Systems in Biology)" size="223566" subdir="corpora" unzip="1" unzipped_size="1537086" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/biocreative_ppi.zip" webpage="http://www.mitre.org/public/biocreative/" />
    <package author="W. N. Francis and H. Kucera" checksum="a0a8630959d3d937873b1265b0a05497" id="brown" license="May be used for non-commercial purposes." name="Brown Corpus" size="3314357" subdir="corpora" unzip="1" unzipped_size="10117565" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/pa

1 个答案:

答案 0 :(得分:1)

文件english.pickle是“punkt”标记器的一部分,它将文本分解为句子。要下载它,请运行以下一次(或在交互式下载器中的模型下找到“punkt”)。

nltk.download("punkt")

下载程序将检查可写入的位置的标准路径列表,并将模型文件保存在那里。之后,它将可用于textrank的内部。