我正在使用ARQ来查询本地RDF文件。我正在使用的命令如下:
./arq --data /home/datasets/a-m-00027.nt --results CSV --query myQuery.sparql
myQuery.sparql
包含查询:
PREFIX basekb:<http://rdf.basekb.com/ns/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?x
FROM </home/data/a-m-00027.nt>
WHERE {?x rdf:type basekb:music.release}
LIMIT 10
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
at com.hp.hpl.jena.graph.impl.SimpleEventManager.notifyAddTriple(SimpleEventManager.java:97)
at com.hp.hpl.jena.graph.impl.GraphBase.notifyAdd(GraphBase.java:124)
at com.hp.hpl.jena.graph.impl.GraphBase.add(GraphBase.java:203)
at com.hp.hpl.jena.sparql.core.DatasetGraphCollection.add(DatasetGraphCollection.java:43)
at com.hp.hpl.jena.sparql.core.DatasetGraphBase.add(DatasetGraphBase.java:82)
at org.apache.jena.riot.system.StreamRDFLib$ParserOutputDataset.triple(StreamRDFLib.java:206)
at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:61)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:185)
at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:534)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:501)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:454)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:432)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:422)
at arq.cmdline.ModDatasetGeneral.addGraphs(ModDatasetGeneral.java:98)
at arq.cmdline.ModDatasetGeneral.createDataset(ModDatasetGeneral.java:87)
at arq.cmdline.ModDatasetGeneralAssembler.createDataset(ModDatasetGeneralAssembler.java:35)
at arq.cmdline.ModDataset.getDataset(ModDataset.java:34)
at arq.query.getDataset(query.java:176)
at arq.query.queryExec(query.java:198)
at arq.query.exec(query.java:159)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at arq.arq.main(arq.java:28)
<http://rdf.basekb.com/ns/architecture.building_complex> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.basekb.com/ns/type.type>
整个文件是否已加载到内存中?
答案 0 :(得分:4)
整个文件是否已加载到内存中?
确切地说,这是你的问题。正如所说的,你可能能够碰撞java堆并使其适合。
但是作为替代方案,或者对于您根本没有足够内存的情况,请尝试使用TDB存储并索引该文件,然后查询它:
$ tdbloader --loc my_tdb_store /home/datasets/a-m-00027.nt
$ tdbquery --loc my_tdb_store --results CSV --query myQuery.sparql
(您可以在完成后删除商店,它只是一个名为my_tdb_store
的目录)
作为第三种选择,您可以完全跳过sparql。您只查找类型basekb:music.release
的前十个内容,您可以这样找到:
$ riot /home/datasets/a-m-00027.nt | \
grep '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.basekb.com/ns/music.release> .' | \
cut -d ' ' -f 1 | \
head -10
使用最少的内存。
答案 1 :(得分:3)
因为异常告诉你,你的内存不足:
java.lang.OutOfMemoryError: GC overhead limit exceeded
很可能你实际上并没有内存不足,但这只是你的JVM设置,默认情况下不会超过一定的内存量。如https://stackoverflow.com/a/21197787/1423333中所述,尝试运行
JVM_ARGS="-Xmx4096M" ./arq --data /home/datasets/a-m-00027.nt --results CSV --query myQuery.sparql