Python rdflib没有正确解析Creative Commons License信息

时间:2013-07-04 05:20:34

标签: python rdf rdflib

我使用的是rdflib版本3.2.3,一切正常。升级到4.0.1后,我开始收到错误:

  

RDFa解析错误! 'ascii'编解码器无法解码位置5454中的字节0xc3:序数不在范围内(128)

我尝试了各种方法来完成这项工作,但到目前为止还没有成功。以下是我的尝试。

在每种情况下我都是:

from rdflib import Graph

首次尝试:

>>> lg =Graph()
>>> len(lg.parse('http://creativecommons.org/licenses/by/3.0/'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/graph.py", line 1002, in parse
parser.parse(source, self, **args)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/plugins/parsers/structureddata.py", line 268, in parse
vocab_cache=vocab_cache)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/plugins/parsers/structureddata.py", line 148, in _process
_check_error(processor_graph)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/plugins/parsers/structureddata.py", line 57, in _check_error
raise Exception("RDFa parsing Error! %s" % msg)
Exception: RDFa parsing Error! 'ascii' codec can't decode byte 0xc3 in position 4801: ordinal not in range(128)

第二次尝试:

>>> lg =Graph()
>>> len(lg.parse('http://creativecommons.org/licenses/by/3.0/rdf'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/graph.py", line 1002, in parse
parser.parse(source, self, **args)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/plugins/parsers/rdfxml.py", line 570, in parse
self._parser.parse(source)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 349, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/plugins/parsers/rdfxml.py", line 160, in endElementNS
self.current.end(name, qname)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/plugins/parsers/rdfxml.py", line 461, in property_element_end
current.data, literalLang, current.datatype)
File "/home/alex/Projects/RDF/rdfEnv/local/lib/python2.7/site-packages/rdflib/term.py", line 541, in __new__
raise Exception("'%s' is not a valid language tag!"%lang)
Exception: 'i18n' is not a valid language tag!

第三次尝试:没有错误但也没有给出任何结果

>>> lg =Graph()
>>> len(lg.parse('http://creativecommons.org/licenses/by/3.0/rdf', format='rdfa'))
0

所以有人请告诉我我错了什么! :)

1 个答案:

答案 0 :(得分:3)

正如格雷厄姆在rdflib邮件列表上回复的那样,有一个html5lib问题 - 我们会在下一个版本中为python 2正确引导它,但现在只需这样做:

pip install html5lib==0.95

第二个问题是来自广告素材的数据存在问题,根据rfc5646,“i18n”确实不是有效的语言标记。我添加了支票,但回想起来,提出异常似乎是严格的。我想我会把它改成警告。