我正在尝试使用Blazegraph在ConceptNet上运行图算法,但首先我必须导入数据。数据将是一次写入,多次读取,因此我不需要任何增量写入。
我从.deb文件安装了Blazegraph 2.1.1。我还下载了ObjectId(Y)
,以便我可以按照涉及在blazegraph.jar上运行命令的说明进行操作。
文件ObjectId(X)
采用N-Triples格式,包含大约2500万条边。这里有一些从一开始:
blazegraph.jar
我从Blazegraph samples on GitHub获得assoc.nt
,但随后更改了结尾:
我添加了</c/af/a_foei_tog/r> </r/SenseOf> </c/af/a_foei_tog> .
</c/af/a_foei_tog/r> </r/Synonym> </c/af/jammer> .
</c/af/a_foei_tog/r> </r/Synonym> </c/af/ongelukkig> .
</c/af/a_foei_tog/r> </r/RelatedTo> </c/fr/malheureusement> .
</c/af/a_foe%C4%B1_tog/r> </r/SenseOf> </c/af/a_foe%C4%B1_tog> .
</c/af/a_foe%C4%B1_tog/r> </r/Synonym> </c/af/jammer> .
</c/af/a_foe%C4%B1_tog/r> </r/Synonym> </c/af/ongelukk%C4%B1g> .
</c/af/a_foe%C4%B1_tog/r> </r/RelatedTo> </c/fr/malheureusement> .
</c/af/a_ja_a/r> </r/SenseOf> </c/af/a_ja_a> .
</c/af/a_ja_a/r> </r/Synonym> </c/af/seker> .
</c/af/a_ja_a/r> </r/Synonym> </c/af/sekerlik> .
,否则会告诉我财产缺失。
我将fastload.properties
从com.bigdata.journal.AbstractJournal.file=blazegraph.jnl
更改为bufferMode
,因为someone's property file表示这会给我一次写入多次读取语义,这是正是我想要的。
这是我的最终DiskRW
:
Disk
我跑了命令:
fastload.properties
它将CPU旋转了几分钟,但最终似乎没有添加任何内容。这是我得到的输出:
# This configuration turns off incremental inference for load and retract, so
# you must explicitly force these operations if you want to compute the closure
# of the knowledge base. Forcing the closure requires punching through the SAIL
# layer. Of course, if you are not using inference then this configuration is
# just the ticket and is quite fast.
# set the initial and maximum extent of the journal
com.bigdata.journal.AbstractJournal.initialExtent=209715200
com.bigdata.journal.AbstractJournal.maximumExtent=209715200
# turn off automatic inference in the SAIL
com.bigdata.rdf.sail.truthMaintenance=false
# don't store justification chains, meaning retraction requires full manual
# re-closure of the database
com.bigdata.rdf.store.AbstractTripleStore.justify=false
# turn off the statement identifiers feature for provenance
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
# turn off the free text index
com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
com.bigdata.journal.AbstractJournal.bufferMode=Disk
com.bigdata.journal.AbstractJournal.file=blazegraph.jnl
答案 0 :(得分:3)
我相信我找到了我遇到的问题的答案。
当Blazegraph导入N-Triples数据时,它会跳过相对URI。我的URI是相对的这一事实是我的错误;似乎在N-Triples中只允许使用绝对URI,但Blazegraph让我知道这一点而不是默默地失败会很好。
我用http://
和域名为我的所有URI加前缀,现在它正在加载数据。以下是我现在的数据:
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_foei_tog> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/jammer> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/ongelukkig> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/RelatedTo> <http://api.conceptnet.io/c/fr/malheureusement> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_foe%C4%B1_tog> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/jammer> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/ongelukk%C4%B1g> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/RelatedTo> <http://api.conceptnet.io/c/fr/malheureusement> .
<http://api.conceptnet.io/c/af/a_ja_a/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_ja_a> .
<http://api.conceptnet.io/c/af/a_ja_a/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/seker> .
我得到了一些令人震惊的输出,似乎表明它需要1到10秒才能加载每个“记录”,但我认为这些警告是误导性的,因为它们只是在加载速度显着减慢时出现:
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.OSP, 1 records (#nodes=1, #leaves=0) in 14582ms : addrRoot=22869767568228938
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 14582ms : addrRoot=22869765391385095
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.OSP, 9 records (#nodes=5, #leaves=4) in 10690ms : addrRoot=25508598331212042
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 9335ms : addrRoot=38702680415142364
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 9 records (#nodes=6, #leaves=3) in 6932ms : addrRoot=63331668311671368
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 11326ms : addrRoot=80044185196954272
尽管有警告,它在大约8分钟内装载了2500万个边缘,这还不错。