运行大量密码的有效方法是什么?

时间:2018-08-09 08:06:19

标签: optimization neo4j cypher

我想将数据导入我的Neo4j数据库。 根据我的原始数据,我生成了很多密码。

例如,我有一个这样的密码列表(最多十万个):

MERGE (product:PRODUCT{name:'X phone'}) MERGE (product)-[:RATE]-(review:REVIEW{content:'worst phone ever'})
MERGE (product:PRODUCT{name:'X phone'}) MERGE (product)-[:RATE]-(review:REVIEW{content:'cheapest phone ever'})
MERGE (product:PRODUCT{name:'Y phone'}) MERGE (product)-[:RATE]-(review:REVIEW{content:'even worse than phone X'})
MERGE (product:PRODUCT{name:'X phone'}) MERGE (product)-[:RATE]-(review:REVIEW{content:'better than newly release Y version'})

我当前的解决方案是使用Python中的Neo4j驱动程序逐行从文件运行密码。

from neo4j.v1 import GraphDatabase
import sys

class CypherClient:
    """
    The client that execute cypher
    """
    def __init__(self, uri, auth):
        self.driver = GraphDatabase.driver(uri, auth=auth)

    def run_cypher(self, cypher):
        """
        execute single cypher
        :param cypher: the cypher in str
        :return: no return anything at all
        """
        with self.driver.session() as session:
            session.run(cypher).single()

if __name__=="__main__":

    """
    execute cypher from file
    each line is independent cypher
    python exec_cypher_file.py outcypher.txt 
    """

    # replace URI and authentication here
    uri = "bolt://localhost:7687"
    auth = ("neo4j", "IAmPusheenTheCat")

    counter = 0

    if len(sys.argv) < 2:
        test()
    else:
        client = CypherClient(uri, auth)
        infile = sys.argv[1]
        errfile = open(infile+".err.txt", 'w')
        for line in open(infile):
            # print(line)
            try:
                client.run_cypher(line)
            except:
                print(str(counter) + " " + line+"\n")
                errfile.write(str(counter) + " " + line+"\n")
            counter+=1
            if counter % 100 == 0 or counter < 100:
                print(counter)
        errfile.close()
    print('done')

我该怎么做才能提高运行大密码的效率?

1 个答案:

答案 0 :(得分:1)

CSV加载往往非常高效,因此,如果您以CSV格式存储数据,则可以使用LOAD CSV

否则,您可以查看Michael Hunger在effective batch updates上的文章,该文章使用UNWIND批量处理输入列表。