我正在Streamsets管道中读取活动文件目录中的数据,该目录中远程上传.csv文件,并将这些数据放入neo4j数据库中。 我使用的步骤是-
现在我正在使用jdbc查询执行所有这些操作,并且使用的密码查询是
MERGE (m:OBSERVATION{
SerialNumber: "${record:value('/SerialNumber')}",
Test_Stage: "${record:value('/Test_Stage')}",
CUR: "${record:value('/CUR')}",
VOLT: "${record:value('/VOLT')}",
Rel_Lot: "${record:value('/Rel_Lot')}",
TimestampINT: "${record:value('/TimestampINT')}",
Temp: "${record:value('/Temp')}",
LP: "${record:value('/LP')}",
MON: "${record:value('/MON')}"
})
MERGE (t:CSV{
SerialNumber: "${record:value('/SerialNumber')}",
Test_Stage: "${record:value('/Test_Stage')}",
TimestampINT: "${record:value('/TimestampINT')}"
})
WITH m
MATCH (t:CSV) where t.SerialNumber=m.SerialNumber and t.Test_Stage=m.Test_Stage and t.TimestampINT=m.TimestampINT MERGE (m)-[:PART_OF]->(t)
WITH t, t.TimestampINT AS TimestampINT
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage and rl.TimestampINT<TimestampINT
SET rl.TimestampINT=TimestampINT
WITH t
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage
MERGE (t)-[:POINTS_TO]->(rl)
WITH rl
MATCH (t:CSV)-[r:POINTS_TO]->(rl) WHERE t.TimestampINT<rl.TimestampINT
DELETE r
现在,此过程非常缓慢,需要大约15分钟的时间来记录10条记录。可以进一步优化吗?
答案 0 :(得分:1)
使用MERGE
的最佳做法是合并单个属性,然后使用SET
添加其他属性。
如果我假设序列号的属性对于每个节点都是唯一的(可能不是),则它看起来像:
MERGE (m:OBSERVATION{SerialNumber: "${record:value('/SerialNumber')}"})
SET m.Test_Stage = "${record:value('/Test_Stage')}",
m.CUR= "${record:value('/CUR')}",
m.VOLT= "${record:value('/VOLT')}",
m.Rel_Lot= "${record:value('/Rel_Lot')}",
m.TimestampINT = "${record:value('/TimestampINT')}",
m.Temp= "${record:value('/Temp')}",
m.LP= "${record:value('/LP')}",
m.MON= "${record:value('/MON')}"
MERGE (t:CSV{
SerialNumber: "${record:value('/SerialNumber')}"
})
SET t.Test_Stage = "${record:value('/Test_Stage')}",
t.TimestampINT = "${record:value('/TimestampINT')}"
WITH m
MATCH (t:CSV) where t.SerialNumber=m.SerialNumber and t.Test_Stage=m.Test_Stage and t.TimestampINT=m.TimestampINT MERGE (m)-[:PART_OF]->(t)
WITH t, t.TimestampINT AS TimestampINT
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage and rl.TimestampINT<TimestampINT
SET rl.TimestampINT=TimestampINT
WITH t
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage
MERGE (t)-[:POINTS_TO]->(rl)
WITH rl
MATCH (t:CSV)-[r:POINTS_TO]->(rl) WHERE t.TimestampINT<rl.TimestampINT
DELETE r
要添加的另一件事是,我可能会将其分为两个查询。 第一个是导入部分,第二个是删除关系。还要尽可能添加唯一的约束和索引。