创建一个具有来自python中的csv的唯一属性的唯一节点的字典,以加载到neo4j

时间:2016-11-08 15:12:23

标签: python csv dictionary neo4j cypher

我在绕着最好的方式缠绕我的大脑时遇到了麻烦;我无法弄清楚字典结构来解析csv并创建具有唯一属性的唯一节点的各种节点字典,然后加载到Neo4j。

这是我到目前为止所做的:

import blah blah

driver = GraphDatabase.driver("bolt://localhost")
input_location = "/path/to/file/file.csv"
output_location = "/path/to/dir/"
def main():

origin = {"PORT": [], "NUMBER": [], "CALL_SIGN": []}
ship = {"BOAT_NAME": [], "BOAT_NUMBER": []}
shipper = {"NAME": [], "STREET": [], "CITY": [], "ZIPCODE": []}
destination = {"COUNTRY": [], "CITY": []}
node_type_list = [origin, ship, shipper, destination]

with open(input_location, "rb") as ship_data:
    reader = csv.DictReader(ship_data, delimiter='|')
    print "parsing & uploading data\n"
    for row in reader:
        for node_type in node_type_list:
            for key in node_type:
                dict_load(node_type,key,row[key])
    send_nodes_to_graph(node_type_list)

def dict_load(node_type,key,value):
    try:
        if value not in node_type[key]:
            node_type[key].append(value)
    except Exception as e:
        print e

def send_nodes_to_graph(node_type_list):
    session = driver.session()
    session.run('''something_like_this:http://stackoverflow.com/questions/40332698/can-a-python-dictionary-be-passed-as-neo4j-literal-maps/40333843#40333843''')
    session.close()
if __name__ == '__main__':

    main()

我的csv看起来像这样:

COUNTRY NUMBER  CALL_SIGN   PORT    BOAT_NAME   BOAT_NUMBER NAME    STREET  CITY    ST  ZIPCODE
D REP   91487   S DOMINGO   BALTIMORE   PESCADO 1276394 JH FWEICH   9874 LOMBARDO WAY   PORT ELIZABETH  NJ  8348
D REP   91487   S DOMINGO   VA BEACH    TROPIC  9872347 JH FWEICH   9874 LOMBARDO WAY   PORT ELIZABETH  NJ  8348
D REP   91487   S DOMINGO   VA BEACH    TROPIC  9872347 JH FWEICH   9874 LOMBARDO WAY   PORT ELIZABETH  NJ  8348
D REP   91487   S DOMINGO   VA BEACH    CAPRICORN   8761231 JH FWEICH   9874 LOMBARDO WAY   PORT ELIZABETH  NJ  8348

我的dict结构目前产生了这个:

origin {'NUMBER': ['91487'], 'CALL_SIGN': ['S DOMINGO'], 'PORT': ['BALTIMORE', 'VA BEACH']}

但是我认为它需要看起来更像这样,才能将唯一的节点加载到Neo4j中:

origin {'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'BALTIMORE'}}
origin {'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'VA BEACH'}}
ship {'1276394': {'BOAT_NAME': 'PESCADO'}}
shipper {'JH FWEICH': {'STREET': '9874 LOMBARDO WAY', 'CITY':'PORT ELIZABETH'}} 
etc....

2 个答案:

答案 0 :(得分:1)

也许尝试这样的事情?

origin = {"PORT", "NUMBER", "CALL_SIGN"}
ship = {"BOAT_NAME", "BOAT_NUMBER"}
shipper = {"NAME", "STREET", "CITY", "ZIPCODE"}
destination = {"COUNTRY", "CITY"}
node_type_list = [origin, ship, shipper, destination]

with open(input_location, "rb") as ship_data:
    reader = csv.DictReader(ship_data, delimiter='|')
    print "parsing & uploading data\n"
    for row in reader:
        dict_list = [{row["NUMBER"]: {key: row[key] for key in sublist}}for sublist in node_type_list]

这将构建一个将NUMBER映射到输入字典的一部分的dicts列表。我对neo4j并不熟悉,但希望这更像你想要的。输出应该看起来像[{'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'BALTIMORE'}}, {'91487': {'BOAT_NUMBER': '1276394', 'BOAT_NAME': 'PESCADO'}}, ...]

答案 1 :(得分:0)

为了清楚起见,我想表明我所做的改变@PatrickHaugh引导我。我想让他对答案背后的逻辑给予赞扬,所以我接受了它并在这里添加更新。他的建议如下,我的实际版本如下所示。

origin = {"PORT", "NUMBER", "CALL_SIGN"}
ship = {"BOAT_NAME", "BOAT_NUMBER"}
shipper = {"NAME", "STREET", "CITY", "ZIPCODE"}
destination = {"COUNTRY", "CITY"}
node_type_list = [origin, ship, shipper, destination]

with open(input_location, "rb") as ship_data:
    reader = csv.DictReader(ship_data, delimiter='|')
    for row in reader:
        dict_list = [{row["NUMBER"]: {key: row[key] for key in sublist}}for sublist in node_type_list]

输出: [{'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'BALTIMORE'}}, {'91487': {'BOAT_NUMBER': '1276394', 'BOAT_NAME': 'PESCADO'}}, ...]

**正确的输出版本:

origin = ["NUMBER", "PORT", "CALL_SIGN"] 
# moved the keys around so that the first key is the identifier for the sublist
ship = ["BOAT_NUMBER", "BOAT_NAME"] 
shipper = ["NAME", "STREET", "CITY", "ZIPCODE"]
destination = ["COUNTRY", "CITY"]
node_type_list = [origin, ship, shipper, destination]

with open(input_location, "rb") as ship_data:
    reader = csv.DictReader(ship_data, delimiter='|')
    for row in reader:
        # uses first element in sublist to identify dict section, uses following elements to populate dict properties
        dict_list = [{row[sublist[0]]: {key: row[key] for key in sublist[1:]}} for sublist in node_type_list]

输出: [{'91487': {'PORT': 'VA BEACH', 'CALL_SIGN': 'S DOMINGO'}}, {'8761231': {'BOAT_NAME': 'CAPRICORN'}}, {'JH FWEICH': {'CITY': 'PORT ELIZABETH'...}}]