如何使用python和rdflib从rdf dump解压缩dmoz url?

时间:2015-04-06 00:57:34

标签: python rdf rdflib dmoz

我试图打开rdf文件(dmoz rdf dump),但是收到此错误消息

Traceback (most recent call last):
  File "/media/_dev_/ODP_RDF_get_links.py", line 4, in <module>
    result = g.parse("data/content.rdf")
  File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 1033, in parse
    parser.parse(source, self, **args)
  File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 577, in parse
    self._parser.parse(source)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 210, in feed
    self._parser.Parse(data, isFinal)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 352, in end_element_ns
    self._cont_handler.endElementNS(pair, None)
  File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 160, in endElementNS
    self.current.end(name, qname)
  File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 331, in node_element_end
    self.error("Repeat node-elements inside property elements: %s"%"".join(name))
  File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 185, in error
    raise ParserError(info + message)
file:///media/_dev_/data/content.rdf:5:12: Repeat node-elements inside property elements: http://dmoz.org/rdf/catid

我的简单代码如下:

import rdflib

g = rdflib.Graph()
result = g.parse("data/content.rdf")

print("graph has %s statements." % len(g))
  1. 我需要能够阅读该文件。
  2. 提取世界类别中的所有链接。
  3. 感谢任何可能的帮助

    修改

    PS:发现此wikipedia rdf_dumps,因此开发自定义脚本是使用此转储所必需的

0 个答案:

没有答案