遍历csv文件并查询属性值-Cypher Neo4j

时间:2019-01-21 21:14:00

标签: loops csv filter neo4j cypher

请参阅csv文件的图像。我正在与Cypher Neo4j合作。如您所见,带有时间戳的活动分别分别属于case_id。许多都属于同一个case_id(在这里您可以看到case_id 3、2、1),但是请想象还有很多。我想对属于相同案例ID的活动进行分组并执行相同的操作!对每个组进行查询(分组是必不可少的)。

除了为每个组重写相同的查询(如以下三个步骤所示)之外,还有其他方法吗?

1。

USING PERIODIC COMMIT 1000

LOAD CSV WITH HEADERS FROM "file:///XY" AS row

WITH toInteger(row.case_id) AS cid, row

WHERE cid=3

CREATE (act:Activity {caseId: cid, activityName: row.activity, time: row.timestamp})

'QUERY'

2。

LOAD CSV WITH HEADERS FROM "file:///XY" AS row

WITH toInteger(row.case_id) AS cid, row

WHERE cid=2

CREATE (act:Activity {caseId: cid, activityName: row.activity, time: row.timestamp})

'QUERY'

3。

LOAD CSV WITH HEADERS FROM "file:///XY" AS row

WITH toInteger(row.case_id) AS cid, row

WHERE cid=1

CREATE (act:Activity {caseId: cid, activityName: row.activity, time: 
row.timestamp})

'QUERY'

因此,基本上我想在迭代所有不同的case-id而不明确命名它们的意义上概括WHERE cid=3(or 2 or 1)。有点像Java for each element in array (array content: group by case_id) do QUERY

有什么想法吗?

预先感谢您,如果这听起来太神秘,我将很乐意提供更好的描述。

更新: 这是查询:

MATCH(act: Activity)
WHERE act.caseId = 1 //and here I want to be able to simplify for EVERY caseId
WITH act ORDER BY act.time ASC 
WITH apoc.coll.frequencies(apoc.coll.pairsMin(COLLECT(act.activityName))) AS g
UNWIND g AS p
RETURN*

enter image description here

1 个答案:

答案 0 :(得分:1)

在我看来,只需将单个LOAD CSV查询处理即可,只需将caseId设置为row.case_id的整数值即可:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///XY" AS row
WITH toInteger(row.case_id) AS cid, row
CREATE (act:Activity {caseId: cid, activityName: row.activity, time: row.timestamp})

好的,我看到您想对每个组执行一些查询。您能解释一下为什么在csv加载后执行查询无法正常工作吗?

会为您执行查询导入后工作吗?

有关您打算运行的查询的更多信息会有所帮助。