我试图使用这些数据在Neo4j中创建此问题底部图像中显示的糖酵解途径:
glycolysis_bioentities.csv
name
α-D-glucose
glucose 6-phosphate
fructose 6-phosphate
"fructose 1,6-bisphosphate"
dihydroxyacetone phosphate
D-glyceraldehyde 3-phosphate
"1,3-bisphosphoglycerate"
3-phosphoglycerate
2-phosphoglycerate
phosphoenolpyruvate
pyruvate
hexokinase
glucose-6-phosphatase
phosphoglucose isomerase
phosphofructokinase
"fructose-bisphosphate aldolase, class I"
triosephosphate isomerase (TIM)
glyceraldehyde-3-phosphate dehydrogenase
phosphoglycerate kinase
phosphoglycerate mutase
enolase
pyruvate kinase
glycolysis_relations.csv
source,relation,target
α-D-glucose,substrate_of,hexokinase
hexokinase,yields,glucose 6-phosphate
glucose 6-phosphate,substrate_of,glucose-6-phosphatase
glucose-6-phosphatase,yields,α-D-glucose
glucose 6-phosphate,substrate_of,phosphoglucose isomerase
phosphoglucose isomerase,yields,fructose 6-phosphate
fructose 6-phosphate,substrate_of,phosphofructokinase
phosphofructokinase,yields,"fructose 1,6-bisphosphate"
"fructose 1,6-bisphosphate",substrate_of,"fructose-bisphosphate aldolase, class I"
"fructose-bisphosphate aldolase, class I",yields,D-glyceraldehyde 3-phosphate
D-glyceraldehyde 3-phosphate,substrate_of,glyceraldehyde-3-phosphate dehydrogenase
D-glyceraldehyde 3-phosphate,substrate_of,triosephosphate isomerase (TIM)
triosephosphate isomerase (TIM),yields,dihydroxyacetone phosphate
glyceraldehyde-3-phosphate dehydrogenase,yields,"1,3-bisphosphoglycerate"
"1,3-bisphosphoglycerate",substrate_of,phosphoglycerate kinase
phosphoglycerate kinase,yields,3-phosphoglycerate
3-phosphoglycerate,substrate_of,phosphoglycerate mutase
phosphoglycerate mutase,yields,2-phosphoglycerate
2-phosphoglycerate,substrate_of,enolase
enolase,yields,phosphoenolpyruvate
phosphoenolpyruvate,substrate_of,pyruvate kinase
pyruvate kinase,yields,pyruvate
到目前为止,这就是我所拥有的
...使用此密码(传递给Cycli
或cypher-shell
):
LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row
MERGE (s:Glycolysis {source: row.source})
MERGE (r:Glycolysis {relation: row.relation})
MERGE (t:Glycolysis {target: row.target})
FOREACH (x in case row.relation when "substrate_of" then [1] else [] end |
MERGE (s)-[r:substrate_of]->(t)
)
FOREACH (x in case row.relation when "yields" then [1] else [] end |
MERGE (s)-[r:yields]->(t)
);
我想创建一个完全连接的路径,所有节点都有字幕。建议?
答案 0 :(得分:3)
[增订]
存在多个问题和可能的改进:
MERGE
,因为它会创建孤立的节点。不应将关系类型调整为Glycolysis
节点,并且此类节点永远不会连接到任何其他节点。MERGE
子句必须对源节点和目标节点使用相同的属性名称(例如,name
),否则相同的化学品最终会有2个节点(具有不同的属性键) )。这就是为什么你最终得到了没有所有预期连接的节点。 MERGE
关系。glycolysis_bioentities.csv
。通过上述更改,您最终得到类似的内容,这将生成与输入数据匹配的连接图:
LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row
MERGE (s:Glycolysis {name: row.source})
MERGE (t:Glycolysis {name: row.target})
WITH s, t, row
CALL apoc.cypher.doIt(
'MERGE (s)-[r:' + row.relation + ']->(t)',
{s:s, t:t}) YIELD value
RETURN 1;
答案 1 :(得分:3)
由于这个问题/答案/话题很可能引起其他人的兴趣,我想提一下我的代码(基于这个SO线程,How to specify relationship type in CSV?,并根据@cybersam提供的提示进行修改)有效,并显示结果:
解决方案1(我原来的帖子,已更新):
LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row
MERGE (s:Glycolysis {name:row.source})
MERGE (t:Glycolysis {name:row.target})
FOREACH (x in case row.relation when "substrate_of" then [1] else [] end |
MERGE (s)-[r:substrate_of]->(t)
)
FOREACH (x in case row.relation when "yields" then [1] else [] end |
MERGE (s)-[r:yields]->(t)
);
解决方案2(cybersam' s,更新):
LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row
MERGE (s:Metabolism:Glycolysis {name: row.source})
MERGE (t:Metabolism:Glycolysis {name: row.target})
WITH s, t, row
// "Bug" -- additional duplicate relations with each iteration of this statement/script:
// CALL apoc.create.relationship(s, row.relation, {}, t) YIELD rel
// Solution:
// https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/271
// https://stackoverflow.com/questions/47808421/neo4j-load-csv-to-create-dynamic-relationship-types
CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
RETURN COUNT(*);
两种解决方案都生成相同的图表,如下所示。 :-D
答案 2 :(得分:0)
如果允许的话,我想再发布一个后续答案 - 我的理由是目前在Neo4j中重建代谢途径的情况很少,以下将提供完整的摘要 StackOverflow标题/主题,“在Neo4j中创建代谢途径”。
与上面的糖酵解途径一样,我在Neo4j中重新创建 TCA (柠檬酸循环 | Kreb's cycle )途径:
[TCA周期图像来源:https://metabolicpathways.stanford.edu/]
在创建我的TCA途径图期间出现的一个问题是,其中一个节点(酶,“aconitase”)被使用了两次,因此在图创建期间data = [
11 76 25 44 55 78;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 0 25 44 55 88;
11 0 25 44 55 0;
11 0 25 44 55 0;
];
for row = 1:5:size(data, 1)
fprintf('Row %d - %d\n', row, row+4);
indices = find(data(row:row+4,:) > 75);
if ~isempty(indices)
[~, cols] = ind2sub([5 size(data, 2)], indices);
col_min = min(cols);
col_max = max(cols);
fprintf('Column: %d and %d\n', col_min, col_max);
end
end
合并了公共节点{{ 1}}作为单个实体,产生这种布局,
......根据需要不是这个,
我对该问题的解决方案是使用节点属性创建“TCA图”,暂时差异标记受影响的源节点和目标节点(稍后在正确创建图形后删除这些标记)。
我还添加了MERGE
标签,以便我可以根据需要选择单个路径(aconitase
| :Metabolism
)或完整代谢网络(:Glycolysis
)
最后,我需要通过它们的公共节点:TCA
连接两个路径(:Metabolism
| :Glycolysis
),我可以通过APOC过程(此处,附加)到我的:TCA
(Cypher)剧本的末尾。
以下是我的CSV数据文件,* .cql Cypher脚本,脚本执行以及结果图。
<强> glycolysis.csv:强>
pyruvate
<强> tca.csv:强>
glycolysis.cql
<强> tca.cql:强>
source,relation,target
α-D-glucose,substrate_of,hexokinase
hexokinase,yields,glucose 6-phosphate
glucose 6-phosphate,substrate_of,glucose-6-phosphatase
glucose-6-phosphatase,yields,α-D-glucose
glucose 6-phosphate,substrate_of,phosphoglucose isomerase
phosphoglucose isomerase,yields,fructose 6-phosphate
fructose 6-phosphate,substrate_of,phosphofructokinase
phosphofructokinase,yields,"fructose 1,6-bisphosphate"
"fructose 1,6-bisphosphate",substrate_of,"fructose-bisphosphate aldolase, class I"
"fructose-bisphosphate aldolase, class I",yields,D-glyceraldehyde 3-phosphate
D-glyceraldehyde 3-phosphate,substrate_of,glyceraldehyde-3-phosphate dehydrogenase
D-glyceraldehyde 3-phosphate,substrate_of,triosephosphate isomerase (TIM)
triosephosphate isomerase (TIM),yields,dihydroxyacetone phosphate
glyceraldehyde-3-phosphate dehydrogenase,yields,"1,3-bisphosphoglycerate"
"1,3-bisphosphoglycerate",substrate_of,phosphoglycerate kinase
phosphoglycerate kinase,yields,3-phosphoglycerate
3-phosphoglycerate,substrate_of,phosphoglycerate mutase
phosphoglycerate mutase,yields,2-phosphoglycerate
2-phosphoglycerate,substrate_of,enolase
enolase,yields,phosphoenolpyruvate
phosphoenolpyruvate,substrate_of,pyruvate kinase
pyruvate kinase,yields,pyruvate
<强> glycolysis.cql:强>
source,relation,target,tag1,tag2
pyruvate,substrate_of,pyruvate dehydrogenase,,
pyruvate dehydrogenase,yields,acetyl CoA,,
acetyl CoA,substrate_of,citrate synthase,,
oxaloacetate,substrate_of,citrate synthase,,
citrate synthase,yields,citrate,,
citrate,substrate_of,aconitase,,1
aconitase,yields,cis-aconitate,1,
cis-aconitate,substrate_of,aconitase,,2
aconitase,yields,isocitrate,2,
isocitrate,substrate_of,isocitrate dehydrogenase,,
isocitrate dehydrogenase,yields,α-ketoglutarate,,
α-ketoglutarate,substrate_of,α-ketoglutarate dehydrogenase,,
α-ketoglutarate dehydrogenase,yields,succinyl-CoA,,
succinyl-CoA,substrate_of,succinyl-CoA synthetase,,
succinyl-CoA synthetase,yields,succinate,,
succinate,substrate_of,succinate dehydrogenase,,
succinate dehydrogenase,yields,fumarate,,
fumarate,substrate_of,fumarase,,
fumarase,yields,S-malate,,
S-malate,substrate_of,malate dehydrogenase,,
malate dehydrogenase,yields,oxaloacetate,,
脚本执行:
// CREATE INDICES:
CREATE INDEX ON :Metabolism(name);
CREATE INDEX ON :TCA(name);
// CREATE GRAPH:
// USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/tca.csv" AS row
MERGE (s:Metabolism:TCA {name: row.source, tag:COALESCE(row.tag1, '')})
MERGE (t:Metabolism:TCA {name: row.target, tag:COALESCE(row.tag2, '')})
WITH s, t, row
CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
REMOVE s.tag, t.tag
RETURN COUNT(*);
Neo4j图表(// CREATE INDICES:
CREATE INDEX ON :Metabolism(name);
CREATE INDEX ON :Glycolysis(name);
// CREATE GRAPH:
//USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/glycolysis.csv" AS row
MERGE (s:Metabolism:Glycolysis {name: row.source})
MERGE (t:Metabolism:Glycolysis {name: row.target})
WITH s, t, row
CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
RETURN COUNT(*);
// MERGE COMMON NODE (GLYCOLYSIS: PYRUVATE; TCA: PYRUVATE):
// As presented, run "tca.cql" first, then "glycolysis.cql"
MATCH (g:Glycolysis), (t:TCA) WHERE g.name = t.name
CALL apoc.refactor.mergeNodes([g,t]) YIELD node
RETURN node;
查看):