我尝试使用py2neo航班模型化将包含22列的CSV文件(包含22个列)加载到neo4j图中。
密码查询在一个查询中使用,并包含节点(机场,城市,航班和飞机)之间的节点和关系创建。但是在运行代码时,即使使用PERICODIC COMMIT也需要永远。
我不确定我编写的密码查询是否已经过优化,可能是缓慢的来源。 对于10 000行,我花了大约10分钟来构建图表... 有人可以帮我吗?这是代码:
def importFromCSVtoNeo(graph):
query = '''
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row FIELDTERMINATOR '\t'
WITH row
MERGE (c_departure:City {cityName: row.cityName_departure})
MERGE (a_departure:Airport {airportName: row.airportName_departure})
MERGE (f_segment1:Flight {airline: row.airline1})
ON CREATE SET f_segment1.class = row.class1,
f_segment1.outboundclassgroup = row.outboundclassgroup1
MERGE (a_departure)-[:IN]->(c_departure)
MERGE (c_departure)-[:HAS]->(a_departure)
MERGE (f_segment1)-[:FROM {departAt: row.outbounddeparttime}]->(a_departure)
MERGE (c_transfer:City {cityName: row.transferCityName})
MERGE (a_transfer:Airport {airportName: row.airportName_transfer})
MERGE (f_segment1)-[:TO_TRANSFER {transferArriveAt: row.transferArriveAt}]->(a_transfer)
MERGE (a_transfer)-[:IN]->(c_transfer)
MERGE (c_transfer)-[:HAS]->(a_transfer)
MERGE (c_arrival:City {cityName: row.cityName_arrival})
MERGE (a_arrival:Airport {airportName: row.airportName_arrival})
MERGE (f_segment2:Flight {airline: row.airline2})
ON CREATE SET f_segment2.class = row.class2,
f_segment2.outboundclassgroup = row.outboundclassgroup2
MERGE (f_segment2)-[:TO {arrivalAt: row.outboundarrivaltime}]->(a_arrival)
MERGE (f_segment2)-[:FROM_TRANSFER {transferDepartAt: row.transferDepartAt}]->(a_transfer)
MERGE (a_arrival)-[:IN]->(c_arrival)
MERGE (c_arrival)-[:HAS]->(a_arrival)
MERGE (p:Plane {saleprice: row.saleprice})
ON CREATE SET p.depart = row.cityName_departure,
p.destination = row.cityName_arrival,
p.salechannel = row.salechannel,
p.planeDuration = row.planeDuration
MERGE (p)-[:HAS_FLIGHTS]->(f_segment1)
MERGE (f_segment1)-[:WAIT_FOR {waitingTime: row.waitingTime}]->(f_segment2)
'''
graph.run(query)
if __name__ == '__main__':
graph = Graph()
importFromCSVtoNeo(graph)
我也尝试在批处理模式下进行,但性能并没有变得更好...... 我将不胜感激任何意见或建议。谢谢!!
答案 0 :(得分:1)
我会在启动脚本之前在节点属性上使用索引,以便让neo4j在使用MERGE时使用它们进行快速查找(因为它必须逐行MATCH节点)。例如,对于我将使用的第一个节点属性:
CREATE INDEX ON :City(cityname)
等等。您可以在py2neo中直接将它们创建为单个运行语句。