我正在尝试使用RDF dump of DBLP提供的DBLP in RDF。我尝试使用Jena的rdfcat将该文件转换为Turtle format:
rdfcat -x dblp-2006-02-06.rdf -out t > dblp.ttl
不幸的是,这会中止,并显示以下错误消息:
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 378, col:
147] {E202} Expecting XML start or end element(s). String data "
????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????" not allowed.
Maybe a striping error.
at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error
(ErrorHandlerFactory.java:128)
at org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDF
XML.java:246)
…
据我可以从另一个问题What is a striping error?中学习,当分层XML结构不符合RDF / XML的偶数/奇数时,在RDF / XML解析中会出现条带化错误规则。现在,查看该文件,文件的相应部分如下所示:
<rdf:Description rdf:about="http://www.informatik.uni-trier.de/~ley/db/journals/ac/ac40.html#YousifTD95"><dc:identifier>journals/ac/YousifTD95</dc:identifier><dc:date>2002-01-03</dc:date><rdf:type rdf:resource="http://sw.deri.org/~aharth/2004/07/dblp/dblp.owl#Article"/>
<dc:creator><foaf:Person rdf:nodeID="MazinSYousif"><foaf:name>Mazin S. Yousif</foaf:name></foaf:Person></dc:creator>
<dc:creator><foaf:Person rdf:nodeID="MatthewThazhuthaveetil"><foaf:name>Matthew Thazhuthaveetil</foaf:name></foaf:Person></dc:creator>
<dc:creator><foaf:Person rdf:nodeID="ChitaRDas"><foaf:name>Chita R. Das</foaf:name></foaf:Person></dc:creator>
<dc:title rdf:parseType="Literal">Cache Coherence in Multiprocessors: A Survey.</dc:title>
<pages>127-179</pages>
<year>1995</year>
<volume>40</volume>
<journal>Advances in Computers</journal>
</rdf:Description>
根据Nano的说法,第378行似乎与 Matthew Thazhuthaveetil 一致。然而,不知何故,我没有看到该线在结构上存在问题的位置(特别是在将该线与其他线相比较时)。那里真的存在结构性问题(如果是的话,它是什么),或错误信息是误导性的?
答案 0 :(得分:0)
我自己用apache jena 2.11.1试过这个,很好。你试过'riot --validate'吗?
错误很奇怪:
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 378, col:
147] {E202} Expecting XML start or end element(s). String data "
????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????" not allowed.
Maybe a striping error.
它没有显示可打印的字符,这很神秘。
错误只是意味着rdf包含属性标记之外的非空白字符。这表明它可能有隐形垃圾,可能落后于</dc:creator>
?
我没有看到类似的东西,所以感觉就像某个地方的IO错误。