无法从文件中读取RDF三元组数据会给出异常吗?

时间:2014-04-11 17:18:07

标签: java hbase rdf bigdata jena

我正在使用JDK 7和我的jena库版本 - 2.11.1

下面是我的样本三元组数据文件名RDF.nt

<http://sce.umkc.edu/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Ontology> .
<http://sce.umkc.edu/> <http://www.w3.org/2002/07/owl#imports> <http://purl.uniprot.org/core/> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.uniprot.org/core/Protein> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/reviewed> <true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/created> <2011-06-28"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/modified> <2011-07-27"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/version> <22"^^<http://www.w3.org/2001/XMLSchema#int> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/mnemonic> <001R_FRG3G" .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/citation> <http://purl.uniprot.org/citations/15165820> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.uniprot.org/uniprot/Q6GZX4> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.uniprot.org/core/citation> .

我的java代码

public class ReadRDF {
    public static void main(String args[]) {
        String inputFileName = "Rdf.nt";
        // use the FileManager to find the input file
        Model model = FileManager.get().loadModel(inputFileName, null,
                "N-TRIPLES");
        model.write(System.out, "TRIPLES");

    }
}

错误

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 4, col: 91] Broken IRI (bad character: '<'): true"^^
    at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
    at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:142)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
    at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:208)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:141)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:130)
    at org.apache.jena.riot.adapters.AdapterFileManager.readModelWorker(AdapterFileManager.java:291)
    at com.hp.hpl.jena.util.FileManager.loadModelWorker(FileManager.java:333)
    at com.hp.hpl.jena.util.FileManager.loadModel(FileManager.java:320)
    at com.jena.main.ReadRDF.main(ReadRDF.java:10)

请帮我阅读这些数据以及如何将RDF数据存储到Hbase数据库中。

如何忽视不良品格:&#39;&lt;&#39;因为我的文件中有超过100万条记录,如果我要更改每条记录需要花费很长时间,请提出另一种选择

2 个答案:

答案 0 :(得分:1)

您的数据已被破坏,您将需要修复@ user205512在其表述中已经指出的错误,然后才能取得进展。

要意识到的另一件事是没有N-TURTLES这样的序列化,你的意思是N-TRIPLES

您的代码可能仅适用,因为Jena忽略了未知语言,而是从文件扩展名中检测输入格式。

答案 1 :(得分:1)

您的数据不好:

<true"^^<http://www.w3.org/2001/XMLSchema#boolean>

不是文字&#34; true&#34; ^^ http://www.w3.org/2001/XMLSchema#boolean

我认为文字应该有很多其他错误。