我的问题是代码似乎正在做某事,然后在message.log文件或浏览器中停止w / no messaging。
更新:似乎正在发生的事情是服务器正在完成其工作,但浏览器未收到通知并报告结果。
我正在运行以下代码:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/KbText.csv" as line
CREATE (kt:KbText {kbid: line.kbid, seq: line.seq,kbtext: line.kbtext})
with kt
match (kb:KBase {kbid: kt.kbid})
with split(tolower(kt.kbtext), " ") as words, kb, kt
with [w in words WHERE NOT w in ["the", "and", "i", "to", "or", "", "Knowledge", "Article"]] as txt, kb, kt
foreach (wd in txt |
merge (v:Vocabulary {word: wd})
merge (kt)-[:WORD]->(v)
merge (v)-[:KB]->(kb)
)
with txt, kb, kt
unwind range(0, size(txt)-2) as wordnum
merge (kbs1:KbSentence {kbid: kb.kbid, word: txt[wordnum], seq: wordnum})
merge (kbs2:KbSentence {kbid: kb.kbid, word: txt[wordnum+1], seq: wordnum+1})
merge (kbs1)-[:NEXT]->(kbs2)
merge (kbs1)-[:TEXT]->(kt)
merge (kbs2)-[:TEXT]->(kt)
merge (kt)-[:WORDSEQ]->(kbs1)
merge (kt)-[:WORDSEQ]->(kbs2)
:架构是:
Indexes ON :ErrLink(kbid) ONLINE ON :ErrLink(errnum) ONLINE ON :KBase(kbid) ONLINE ON :KBase(groupcode) ONLINE ON :KbGroup(groupcode) ONLINE ON :KbGroup(kbgroup) ONLINE ON :KbLink(kbid) ONLINE ON :KbLong(kbid) ONLINE ON :KbSentence(kbid) ONLINE ON :KbSentence(seq) ONLINE ON :KbSentence(word) ONLINE ON :KbText(kbid) ONLINE ON :KbTextWord(kbid) ONLINE ON :KbTextWord(word) ONLINE ON :KbTxtWord(kbid) ONLINE ON :ProError(errnum) ONLINE ON :Vocabulary(word) ONLINE No constraints
有一段时间我在事务日志中看到活动,然后停止。发生这种情况时,日志文件中会显示以下消息:
2016-03-17 12:32:03.234+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 113992 taken from index containing 113992 entries 2016-03-17 12:32:23.244+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 123992 taken from index containing 123992 entries 2016-03-17 12:32:43.349+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 132992 taken from index containing 132992 entries 2016-03-17 12:33:03.247+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 143992 taken from index containing 143992 entries 2016-03-17 12:36:13.308+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Starting check pointing... 2016-03-17 12:36:13.308+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Starting store flush... 2016-03-17 12:36:13.721+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 18762 to [D:\OpenData\ProKB\Neo4j\neostore.counts.db.a], from [D:\OpenData\ProKB\Neo4j\neostore.counts.db.b]. 2016-03-17 12:36:13.743+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 18762 to [D:\OpenData\ProKB\Neo4j\neostore.counts.db.a], from [D:\OpenData\ProKB\Neo4j\neostore.counts.db.b]. 2016-03-17 12:36:13.915+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Store flush completed 2016-03-17 12:36:13.915+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Starting appending check point entry into the tx log... 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Appending check point entry into the tx log completed 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Check pointing completed 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [41]: Starting log pruning. 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [41]: Log pruning complete.
浏览器正在显示圆点的旋转圈,如果我打开另一个浏览器到数据库,那么这个代码应该创建的新标签/关系 - 不在那里。
如何跟踪加载过程中发生的情况,以便查看是否正在执行任何操作并根据需要调整代码?
答案 0 :(得分:1)
您的LOAD CSV
命令在查询计划中显示eager pipe
,请使用以下方式验证:
explain LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/KbText.csv" as line
CREATE (kt:KbText {kbid: line.kbid, seq: line.seq,kbtext: line.kbtext})
with kt
match (kb:KBase {kbid: kt.kbid})
with split(tolower(kt.kbtext), " ") as words, kb, kt
with [w in words WHERE NOT w in ["the", "and", "i", "to", "or", "", "Knowledge", "Article"]] as txt, kb, kt
foreach (wd in txt |
merge (v:Vocabulary {word: wd})
merge (kt)-[:WORD]->(v)
merge (v)-[:KB]->(kb)
)
with txt, kb, kt
unwind range(0, size(txt)-2) as wordnum
merge (kbs1:KbSentence {kbid: kb.kbid, word: txt[wordnum], seq: wordnum})
merge (kbs2:KbSentence {kbid: kb.kbid, word: txt[wordnum+1], seq: wordnum+1})
merge (kbs1)-[:NEXT]->(kbs2)
merge (kbs1)-[:TEXT]->(kt)
merge (kbs2)-[:TEXT]->(kt)
merge (kt)-[:WORDSEQ]->(kbs1)
merge (kt)-[:WORDSEQ]->(kbs2)
eager管道阻止执行定期提交,这意味着在一个单个大事务中处理完整文件。由于事务需要先在内存中建立,然后在commit
上刷新到光盘之前需要有足够的内存。在大多数情况下,您不会因为JVM可能会锁定而无法进行垃圾收集。
有关于此主题的几篇博文,例如: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/。
解决方法是将语句拆分为不显示eager
的较小块,并分别运行每个块 - 这当然意味着您多次迭代csv文件。