Neo4j:活动跟踪

时间:2016-03-17 13:08:12

标签: neo4j cypher

我的问题是代码似乎正在做某事,然后在message.log文件或浏览器中停止w / no messaging。

更新:似乎正在发生的事情是服务器正在完成其工作,但浏览器未收到通知并报告结果。

我正在运行以下代码:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/KbText.csv" as line 
CREATE (kt:KbText {kbid: line.kbid, seq: line.seq,kbtext: line.kbtext})
with kt 
match (kb:KBase {kbid: kt.kbid})
with  split(tolower(kt.kbtext), " ") as words, kb, kt 
with [w in words WHERE NOT w in ["the", "and", "i", "to", "or", "", "Knowledge", "Article"]] as txt, kb, kt
foreach (wd in txt | 
    merge (v:Vocabulary {word: wd})
    merge (kt)-[:WORD]->(v)
    merge (v)-[:KB]->(kb)
    )
with txt, kb, kt
unwind range(0, size(txt)-2) as wordnum
merge (kbs1:KbSentence {kbid: kb.kbid, word: txt[wordnum],   seq: wordnum})
merge (kbs2:KbSentence {kbid: kb.kbid, word: txt[wordnum+1], seq: wordnum+1})
merge (kbs1)-[:NEXT]->(kbs2)

merge (kbs1)-[:TEXT]->(kt)
merge (kbs2)-[:TEXT]->(kt)

merge (kt)-[:WORDSEQ]->(kbs1)
merge (kt)-[:WORDSEQ]->(kbs2)

:架构是:

Indexes
  ON :ErrLink(kbid)      ONLINE  
  ON :ErrLink(errnum)    ONLINE  
  ON :KBase(kbid)        ONLINE  
  ON :KBase(groupcode)   ONLINE  
  ON :KbGroup(groupcode) ONLINE  
  ON :KbGroup(kbgroup)   ONLINE  
  ON :KbLink(kbid)       ONLINE  
  ON :KbLong(kbid)       ONLINE  
  ON :KbSentence(kbid)   ONLINE  
  ON :KbSentence(seq)    ONLINE  
  ON :KbSentence(word)   ONLINE  
  ON :KbText(kbid)       ONLINE  
  ON :KbTextWord(kbid)   ONLINE  
  ON :KbTextWord(word)   ONLINE  
  ON :KbTxtWord(kbid)    ONLINE  
  ON :ProError(errnum)   ONLINE  
  ON :Vocabulary(word)   ONLINE  

No constraints

有一段时间我在事务日志中看到活动,然后停止。发生这种情况时,日志文件中会显示以下消息:

2016-03-17 12:32:03.234+0000 INFO  [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 113992 taken from index containing 113992 entries
2016-03-17 12:32:23.244+0000 INFO  [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 123992 taken from index containing 123992 entries
2016-03-17 12:32:43.349+0000 INFO  [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 132992 taken from index containing 132992 entries
2016-03-17 12:33:03.247+0000 INFO  [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 143992 taken from index containing 143992 entries
2016-03-17 12:36:13.308+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]:  Starting check pointing...
2016-03-17 12:36:13.308+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]:  Starting store flush...
2016-03-17 12:36:13.721+0000 INFO  [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 18762 to [D:\OpenData\ProKB\Neo4j\neostore.counts.db.a], from [D:\OpenData\ProKB\Neo4j\neostore.counts.db.b].
2016-03-17 12:36:13.743+0000 INFO  [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 18762 to [D:\OpenData\ProKB\Neo4j\neostore.counts.db.a], from [D:\OpenData\ProKB\Neo4j\neostore.counts.db.b].
2016-03-17 12:36:13.915+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]:  Store flush completed
2016-03-17 12:36:13.915+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]:  Starting appending check point entry into the tx log...
2016-03-17 12:36:13.988+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]:  Appending check point entry into the tx log completed
2016-03-17 12:36:13.988+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]:  Check pointing completed
2016-03-17 12:36:13.988+0000 INFO  [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [41]:  Starting log pruning.
2016-03-17 12:36:13.988+0000 INFO  [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [41]:  Log pruning complete.

浏览器正在显示圆点的旋转圈,如果我打开另一个浏览器到数据库,那么这个代码应该创建的新标签/关系 - 不在那里。

如何跟踪加载过程中发生的情况,以便查看是否正在执行任何操作并根据需要调整代码?

1 个答案:

答案 0 :(得分:1)

您的LOAD CSV命令在查询计划中显示eager pipe,请使用以下方式验证:

explain LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/KbText.csv" as line 
CREATE (kt:KbText {kbid: line.kbid, seq: line.seq,kbtext: line.kbtext})
with kt 
match (kb:KBase {kbid: kt.kbid})
with  split(tolower(kt.kbtext), " ") as words, kb, kt 
with [w in words WHERE NOT w in ["the", "and", "i", "to", "or", "", "Knowledge", "Article"]] as txt, kb, kt
foreach (wd in txt | 
    merge (v:Vocabulary {word: wd})
    merge (kt)-[:WORD]->(v)
    merge (v)-[:KB]->(kb)
    )
with txt, kb, kt
unwind range(0, size(txt)-2) as wordnum
merge (kbs1:KbSentence {kbid: kb.kbid, word: txt[wordnum],   seq: wordnum})
merge (kbs2:KbSentence {kbid: kb.kbid, word: txt[wordnum+1], seq: wordnum+1})
merge (kbs1)-[:NEXT]->(kbs2)

merge (kbs1)-[:TEXT]->(kt)
merge (kbs2)-[:TEXT]->(kt)

merge (kt)-[:WORDSEQ]->(kbs1)
merge (kt)-[:WORDSEQ]->(kbs2)

enter image description here

eager管道阻止执行定期提交,这意味着在一个单个大事务中处理完整文件。由于事务需要先在内存中建立,然后在commit上刷新到光盘之前需要有足够的内存。在大多数情况下,您不会因为JVM可能会锁定而无法进行垃圾收集。

有关于此主题的几篇博文,例如: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

解决方法是将语句拆分为不显示eager的较小块,并分别运行每个块 - 这当然意味着您多次迭代csv文件。