使用批量数据加载器将三元组加载到Blazegraph中

时间:2016-06-13 17:57:04

标签: import rdf blazegraph

我正在尝试使用Blazegraph在ConceptNet上运行图算法,但首先我必须导入数据。数据将是一次写入,多次读取,因此我不需要任何增量写入。

我从.deb文件安装了Blazegraph 2.1.1。我还下载了ObjectId(Y),以便我可以按照涉及在blazegraph.jar上运行命令的说明进行操作。

文件ObjectId(X)采用N-Triples格式,包含大约2500万条边。这里有一些从一开始:

blazegraph.jar

我从Blazegraph samples on GitHub获得assoc.nt,但随后更改了结尾:

  • 我添加了</c/af/a_foei_tog/r> </r/SenseOf> </c/af/a_foei_tog> . </c/af/a_foei_tog/r> </r/Synonym> </c/af/jammer> . </c/af/a_foei_tog/r> </r/Synonym> </c/af/ongelukkig> . </c/af/a_foei_tog/r> </r/RelatedTo> </c/fr/malheureusement> . </c/af/a_foe%C4%B1_tog/r> </r/SenseOf> </c/af/a_foe%C4%B1_tog> . </c/af/a_foe%C4%B1_tog/r> </r/Synonym> </c/af/jammer> . </c/af/a_foe%C4%B1_tog/r> </r/Synonym> </c/af/ongelukk%C4%B1g> . </c/af/a_foe%C4%B1_tog/r> </r/RelatedTo> </c/fr/malheureusement> . </c/af/a_ja_a/r> </r/SenseOf> </c/af/a_ja_a> . </c/af/a_ja_a/r> </r/Synonym> </c/af/seker> . </c/af/a_ja_a/r> </r/Synonym> </c/af/sekerlik> . ,否则会告诉我财产缺失。

  • 我将fastload.propertiescom.bigdata.journal.AbstractJournal.file=blazegraph.jnl更改为bufferMode,因为someone's property file表示这会给我一次写入多次读取语义,这是正是我想要的。

这是我的最终DiskRW

Disk

我跑了命令:

fastload.properties

它将CPU旋转了几分钟,但最终似乎没有添加任何内容。这是我得到的输出:

# This configuration turns off incremental inference for load and retract, so
# you must explicitly force these operations if you want to compute the closure
# of the knowledge base.  Forcing the closure requires punching through the SAIL
# layer.  Of course, if you are not using inference then this configuration is
# just the ticket and is quite fast.

# set the initial and maximum extent of the journal
com.bigdata.journal.AbstractJournal.initialExtent=209715200
com.bigdata.journal.AbstractJournal.maximumExtent=209715200

# turn off automatic inference in the SAIL
com.bigdata.rdf.sail.truthMaintenance=false

# don't store justification chains, meaning retraction requires full manual
# re-closure of the database
com.bigdata.rdf.store.AbstractTripleStore.justify=false

# turn off the statement identifiers feature for provenance
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

# turn off the free text index
com.bigdata.rdf.store.AbstractTripleStore.textIndex=false

com.bigdata.journal.AbstractJournal.bufferMode=Disk
com.bigdata.journal.AbstractJournal.file=blazegraph.jnl

1 个答案:

答案 0 :(得分:3)

我相信我找到了我遇到的问题的答案。

当Blazegraph导入N-Triples数据时,它会跳过相对URI。我的URI是相对的这一事实是我的错误;似乎在N-Triples中只允许使用绝对URI,但Blazegraph让我知道这一点而不是默默地失败会很好。

我用http://和域名为我的所有URI加前缀,现在它正在加载数据。以下是我现在的数据:

<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_foei_tog> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/jammer> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/ongelukkig> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/RelatedTo> <http://api.conceptnet.io/c/fr/malheureusement> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_foe%C4%B1_tog> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/jammer> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/ongelukk%C4%B1g> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/RelatedTo> <http://api.conceptnet.io/c/fr/malheureusement> .
<http://api.conceptnet.io/c/af/a_ja_a/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_ja_a> .
<http://api.conceptnet.io/c/af/a_ja_a/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/seker> .

我得到了一些令人震惊的输出,似乎表明它需要1到10秒才能加载每个“记录”,但我认为这些警告是误导性的,因为它们只是在加载速度显着减慢时出现:

WARN : AbstractBTree.java:3758: wrote: name=kb.spo.OSP, 1 records (#nodes=1, #leaves=0) in 14582ms : addrRoot=22869767568228938
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 14582ms : addrRoot=22869765391385095
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.OSP, 9 records (#nodes=5, #leaves=4) in 10690ms : addrRoot=25508598331212042
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 9335ms : addrRoot=38702680415142364
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 9 records (#nodes=6, #leaves=3) in 6932ms : addrRoot=63331668311671368
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 11326ms : addrRoot=80044185196954272

尽管有警告,它在大约8分钟内装载了2500万个边缘,这还不错。