Neo4j使用带有空值的MERGE

时间:2016-11-24 14:06:15

标签: csv neo4j null cypher

我知道这个问题之前曾被问过几次,但答案并没有解决我的问题。我正在尝试执行此查询:

typings install dt~google.maps --global

但我收到错误:USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1 MERGE (a:Address1 {address_name1:line1.address1})

其他人建议使用:

Cannot merge node using null property value for address_name1

但是如果节点具有多个属性,则此解决方案有效。就我而言,它只有USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1 MERGE (a:Address1) ON CREATE SET a.address_name1=line1.address1 ON MATCH SET a.address_name1=line1.address1 属性。

有没有办法解决这个问题,比如在address_name1或其他解决方案之前用查询中的单词替换空值?

2 个答案:

答案 0 :(得分:6)

如果没有地址,你真的需要创建Address节点吗?

您可以使用WITH / WHERE

过滤CSV中的行
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
WITH line1
WHERE NOT line1.address1 IS NULL
MERGE (a:Address1 {address_name1:line1.address1})

否则,如果要创建表示“未知”地址的节点,可以使用coalesce()替换默认值:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1 {address_name1: coalesce(line1.address1, "Unknown")})

答案 1 :(得分:2)

您好:我发布了这个相当广泛的答案,因为我最近在尝试将这些数据加载到Neo4j(neo4j 3.3.4)时遇到了处理我的CSV文件中存在的NULL(丢失)值的惊人困难。

我提出三种解决方案。

我正在使用Cycli(cycli 0.7.6)CLI,通过pip安装在Arch Linux x86_64系统上的Python 3.5 venv中。

我的CSV文件(glycolysis_metabolites.csv)是:

name,abbreviation,kegg_entry
α-D-glucose,GLC,C00267
glucose 6-phosphate,G6P,C00668
fructose 6-phosphate,F6P,C05345
"fructose 1,6-bisphosphate",FBP,C05378
dihydroxyacetone phosphate,DHAP,C00111
D-glyceraldehyde 3-phosphate,,C00118
"1,3-bisphosphoglycerate","1,3-BPG",C00236
3-phosphoglycerate,3PG,C00197
2-phosphoglycerate,2PG,C00631
phosphoenolpyruvate,PEP,C00074
pyruvate,,C00022

通过psql / COPY ...命令从PostgreSQL表复制的那些数据有一个" UNIQUE NOT NULL"约束"名称"字段。

在调查Google等之后,我进行了三次实验,如下所示。实验2和3基本相同。

我认为实验2中显示的方法是最佳解决方案,因为COALESCE语句包含在MERGE语句中。

我得出这个结论的原因是实验2使用" local"变量,而不是返回"全球"变量(实验3),从而最大限度地减少了对重用变量名称的意外后果。

我按如下方式加载Cypher脚本:

cat glycolysis_script.cypher |  cypher-shell -u victoria -p <your_password>

**实验1 **

参考:http://markhneedham.com/blog/2014/08/22/neo4j-load-csv-handling-empty-columns/

这个解决方案(Mark Needham&#39; s)非常聪明:它创建包含所有非NULL属性的节点,例如

<id>: 0 abbreviation: GLC kegg_entry: C00267 name: α-D-glucose <id>: 10 kegg_entry: C00022 name: pyruvate

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row
MERGE (a:GlycolysisMetabolites {name: row.name})
FOREACH(ignoreMe IN CASE WHEN row.abbreviation <> "" THEN [1] ELSE [] END | SET a.abbreviation = row.abbreviation)
FOREACH(ignoreMe IN CASE WHEN row.kegg_entry <> "" THEN [1] ELSE [] END | SET a.kegg_entry = row.kegg_entry)
// With "USING PERIODIC COMMIT",
// RETURN a;
// throws this error: "Unknown value type: STRUCT"
// ... so, use this:
RETURN a.name, a.abbreviation, a.kegg_entry;

输出:

$ cat glycolysis.cypher |  cypher-shell -u victoria -p <your_password>

a.name, a.abbreviation, a.kegg_entry
"α-D-glucose", "GLC", "C00267"
"glucose 6-phosphate", "G6P", "C00668"
"fructose 6-phosphate", "F6P", "C05345"
"fructose 1,6-bisphosphate", "FBP", "C05378"
"dihydroxyacetone phosphate", "DHAP", "C00111"
"D-glyceraldehyde 3-phosphate", NULL, "C00118"
"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"
"3-phosphoglycerate", "3PG", "C00197"
"2-phosphoglycerate", "2PG", "C00631"
"phosphoenolpyruvate", "PEP", "C00074"
"pyruvate", NULL, "C00022"

但是,您无法在包含NULL值的属性上设置自己的MERGE规范(此处:&#34;缩写&#34;) - 原因是您无法在NULL属性值上进行合并。

使用:

MERGE (a:GlycolysisMetabolites {name: row.name})

失败(&#34;无法使用null属性值为缩写&#34合并节点):

MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:row.abbreviation})
MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:row.abbreviation, kegg_entry:row.kegg_entry})

实验2

参考:Neo4j use MERGE with null values

在这里,我设置一个空字符串(&#39;&#39;)作为CSV文件中存在的NULL值的替代;你可以用任何你想要的东西;例如:&#39; Undefined&#39;,&#39; null&#39;,...

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row
// MERGE (a:GlycolysisMetabolites {name: row.name})
MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:COALESCE(row.abbreviation, ''), kegg_entry:COALESCE(row.kegg_entry, '')})
// With "USING PERIODIC COMMIT",
// RETURN a;
// throws this error: "Unknown value type: STRUCT"
// ... so, use this:
RETURN a.name, a.abbreviation, a.kegg_entry;

输出:

$ cat glycolysis.cypher |  cypher-shell -u victoria -p <your_password>

a.name, a.abbreviation, a.kegg_entry
"α-D-glucose", "GLC", "C00267"
"glucose 6-phosphate", "G6P", "C00668"
"fructose 6-phosphate", "F6P", "C05345"
"fructose 1,6-bisphosphate", "FBP", "C05378"
"dihydroxyacetone phosphate", "DHAP", "C00111"
"D-glyceraldehyde 3-phosphate", "", "C00118"
"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"
"3-phosphoglycerate", "3PG", "C00197"
"2-phosphoglycerate", "2PG", "C00631"
"phosphoenolpyruvate", "PEP", "C00074"
"pyruvate", "", "C00022"

实验3

参考文献:

Neo4j use MERGE with null values

https://github.com/neo4j/neo4j/issues/2521

这也有效,但由于COALESCE语句在MERGE语句之外,我担心RETURN语句返回的数据可能会导致问题,如果这些变量名在其他地方重用。作为一种解决方法, 我添加了一个前缀(a_)作为准UID,但我认为上面的实验2中的解决方案是更好的方法。

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row
WITH
  COALESCE(CASE row.name WHEN '' THEN null ELSE row.name END, '') AS a_name,
  COALESCE(CASE row.abbreviation WHEN '' THEN null ELSE row.abbreviation END, '') AS a_abbreviation,
  COALESCE(CASE row.kegg_entry WHEN '' THEN null ELSE row.kegg_entry END, '') AS a_kegg_entry
MERGE (a:GlycolysisMetabolites {name:a_name, abbreviation:a_abbreviation, kegg_entry:a_kegg_entry})
// Note: RETURN can only be used at the end of the query
RETURN a_name, a_abbreviation, a_kegg_entry;

输出:

$ cat glycolysis.cypher |  cypher-shell -u victoria -p <your_password>

a_name, a_abbreviation, a_kegg_entry
"α-D-glucose", "GLC", "C00267"
"glucose 6-phosphate", "G6P", "C00668"
"fructose 6-phosphate", "F6P", "C05345"
"fructose 1,6-bisphosphate", "FBP", "C05378"
"dihydroxyacetone phosphate", "DHAP", "C00111"
"D-glyceraldehyde 3-phosphate", "", "C00118"
"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"
"3-phosphoglycerate", "3PG", "C00197"
"2-phosphoglycerate", "2PG", "C00631"
"phosphoenolpyruvate", "PEP", "C00074"
"pyruvate", "", "C00022"

有关此主题/问题的其他StackOverflow讨论:   https://stackoverflow.com/search?tab=votes&q=Neo4j%20use%20MERGE%20with%20null%20value

<强>附录

参考(例如):Neo4j CSV file load with empty cells

这&#34;工作&#34;,但SKIPS创建一个节点,如果任何字段包含NULL值:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row
FOREACH (
    x IN CASE WHEN row.abbreviation IS NULL OR row.kegg_entry IS NULL THEN [] ELSE [1] END |
    MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation: row.abbreviation, kegg_entry: row.kegg_entry})
    )
RETURN row.name, row.abbreviation, row.kegg_entry;

输出:

$ cat glycolysis.cypher |  cypher-shell -u victoria -p <password>

row.name, row.abbreviation, row.kegg_entry
"α-D-glucose", "GLC", "C00267"
"glucose 6-phosphate", "G6P", "C00668"
"fructose 6-phosphate", "F6P", "C05345"
"fructose 1,6-bisphosphate", "FBP", "C05378"
"dihydroxyacetone phosphate", "DHAP", "C00111"
"D-glyceraldehyde 3-phosphate", NULL, "C00118"
"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"
"3-phosphoglycerate", "3PG", "C00197"
"2-phosphoglycerate", "2PG", "C00631"
"phosphoenolpyruvate", "PEP", "C00074"
"pyruvate", NULL, "C00022"

请注意,在Neo4j浏览器中,只创建了9个节点(不是11个节点):节点用于&#34; D-甘油醛3-磷酸酯&#34;和&#34;丙酮酸&#34;没有创建。