python ntlk donwload给解析器eror

时间:2017-04-14 13:40:32

标签: python nltk

我正在尝试运行以下命令

import nltk
nltk.download('all')

但我收到此错误

Traceback (most recent call last):
  File "./update.py", line 3, in <module>
    nltk.download('all')
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 664, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 534, in incr_download
    try: info = self._info_or_id(info_or_id)
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 508, in _info_or_id
    return self.info(info_or_id)
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 875, in info
    self._update_index()
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 825, in _update_index
    ElementTree.parse(compat.urlopen(self._url)).getroot())
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143

我是python的新手,所以我不确定应该怎么做。 我查看了上面报告的源模块,发现它正在尝试下载xml文件。所以我运行下面的命令,并没有给我任何错误。

compat.urlopen('https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml')

所以我认为下载中没有问题,但在解析器中。有人可以建议我如何从这里开始?

2 个答案:

答案 0 :(得分:6)

index.xml有一个错字。它已经打补丁了。刚刚检查过,nltk.download('all')工作正常!

请参阅:nltk/nltk_data#70

答案 1 :(得分:1)

问题在于NLTK返回的XML。

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143

在23:143,我们看到了问题,缺少'=':

... unzip="1" unzipped_size"1917" url="https...

NTLK肯定会很快解决这个问题,直到那时我不确定最佳答案是什么。