Question

我有一个关系数据库，我将表转换为csv文件。我导入了其中两个，并通过指定要拾取的列来创建节点，如下面的代码所示：

import csv
from py2neo import neo4j, authenticate, Graph, Node, cypher, rel, Relationship
authenticate("localhost:7474", "neo4j", "my_password")
graph_db = Graph()
graph_db.delete_all()

"""import all rows and columns of csv files"""

with open('File1.csv', "rb") as abc_file, open('File2.csv', "rb") as efg_file:
data1 = csv.reader(abc_file, delimiter=';')
data2 = csv.reader(efg_file, delimiter=';')
data1.next()
data2.next()

"""Create the nodes for the all the rows of "Contact Email" column of abc_file"""
rownum = 0
for row in abc_file:
    nodes1 = Node("Contact_Email", email=row[0])
    contact_graph = graph_db.create(nodes1)

"""Create the nodes for the all the rows of "Building_Name" and "Person_Created" 
   columns of efg_file"""
rownum = 0
for row in efg_file:
    nodes2 = Node("Building_Name", name=row[0])
    nodes3 = Node("Person_Created", name=row[1])
    building_graph = graph_db.create(nodes2, nodes3)

假设在“File1.csv”的“Contact_Email”列下有60封电子邮件，即Primary_Key。它在“Person_Created”列下的“File2.csv”中用作Foreign_Key。在“建筑物名称”下指定了14个建筑物，并在“Person_Created”列中显示相应的电子邮件。我的问题是：

1）如何将File2.csv“Person_Created”列中的14封电子邮件与File1.csv“联系电子邮件”列中的电子邮件进行匹配，以避免重复

2）如何在“建筑名称”（在File2.csv中）和“Person_Created”（在File1.csv中）之间创建一个没有任何重复的关系......就像“Building1234是DESIGNED_BY abc@xyz.com “

如何使用/不使用cypher在py2neo中执行此操作？

Answer 1

为联系电子邮件创建索引或唯一约束。

将节点的属性命名为电子邮件可能是个好主意。

在迭代Person_Created时，使用电子邮件外键值创建联系电子邮件的节点，其属性为email。

由于索引/约束已到位，将有条件地创建节点

在此次迭代中也创建Person Created和Contact Email之间的关系。

Answer 2

Py2neo为此提供了许多唯一性功能。请查看this page以查看merge_one和朋友。然后可以存储从此返回的节点值，并将其用作唯一关系和路径。

请注意，为了获得更高的性能，您可能希望查看Cypher事务或批处理。如果没有这些，每个动作都需要调用服务器，而且规模很大，这很慢。

py2neo - 匹配并合并来自两个不同csv的两个节点，并创建关系

2 个答案: