如何在节点之间创建Neo4J关系yelp数据集

时间:2017-11-15 05:37:35

标签: neo4j

我是Neo4j的新手。我正在尝试在Neo4j中填充Yelp数据集。基本上,我对他们提供的三个json文件感兴趣,即

user.json

{
    "user_id": "-lGwMGHMC_XihFJNKCJNRg",
    "name": "Gabe",
    "review_count": 277,
    "yelping_since": "2014-10-31",
    "friends": ["Oa84FFGBw1axX8O6uDkmqg", "SRcWERSl4rhm-Bz9zN_J8g", "VMVGukgapRtx3MIydAibkQ", "8sLNQ3dAV35VBCnPaMh1Lw", "87LhHHXbQYWr5wlo5W7_QQ"],
    "useful": 45,
    "funny": 4,
    "cool": 55,
    "fans": 17,
    "elite": [],
    "average_stars": 4.72,
    "compliment_hot": 5,
    "compliment_more": 1,
    "compliment_profile": 0,
    "compliment_cute": 1,
    "compliment_list": 0,
    "compliment_note": 11,
    "compliment_plain": 20,
    "compliment_cool": 15,
    "compliment_funny": 15,
    "compliment_writer": 1,
    "compliment_photos": 8
}

我省略了friends数组中的几个条目以使输出可读

business.json

{
    "business_id": "YDf95gJZaq05wvo7hTQbbQ",
    "name": "Richmond Town Square",
    "neighborhood": "",
    "address": "691 Richmond Rd",
    "city": "Richmond Heights",
    "state": "OH",
    "postal_code": "44143",
    "latitude": 41.5417162,
    "longitude": -81.4931165,
    "stars": 2.0,
    "review_count": 17,
    "is_open": 1,
    "attributes": {
        "RestaurantsPriceRange2": 2,
        "BusinessParking": {
            "garage": false,
            "street": false,
            "validated": false,
            "lot": true,
            "valet": false
        },
        "BikeParking": true,
        "WheelchairAccessible": true
    },
    "categories": ["Shopping", "Shopping Centers"],
    "hours": {
        "Monday": "10:00-21:00",
        "Tuesday": "10:00-21:00",
        "Friday": "10:00-21:00",
        "Wednesday": "10:00-21:00",
        "Thursday": "10:00-21:00",
        "Sunday": "11:00-18:00",
        "Saturday": "10:00-21:00"
    }
}

review.json

{
    "review_id": "VfBHSwC5Vz_pbFluy07i9Q",
    "user_id": "-lGwMGHMC_XihFJNKCJNRg",
    "business_id": "YDf95gJZaq05wvo7hTQbbQ",
    "stars": 5,
    "date": "2016-07-12",
    "text": "My girlfriend and I stayed here for 3 nights and loved it.",
    "useful": 0,
    "funny": 0,
    "cool": 0
}

正如我们在示例文件中看到的那样,用户和业务之间的关系通过review.json文件关联。如何使用user文件在businessreview.json之间创建关系边缘。

我还看过Mark Needham教程,他已经显示了StackOverflow数据填充,但在这种情况下,关系文件已经存在样本数据。我需要构建一个类似的文件吗?如果是,我应该如何处理这个问题?还是有任何其他方式来建立用户和用户之间的关系;么

1 个答案:

答案 0 :(得分:1)

这在很大程度上取决于你的模型,你可以做3个进口:

//Create Users - does assume the data is unique
CALL apoc.load.json('file:///c://temp//SO//user.json') YIELD value AS user
CREATE (u:User)
SET u = user

然后添加商家:

CALL apoc.load.json('file:///c://temp//SO//business.json') YIELD value AS business
CREATE (b:Business {
            business_id     : business.business_id,
            name            : business.name,
            neighborhood    : business.neighborhood,
            address         : business.address,
            city            : business.city,
            state           : business.state,
            postal_code     : business.postal_code,
            latitude        : business.latitude,
            longitude       : business.longitude,
            stars           : business.stars,
            review_count    : business.review_count,
            is_open         : business.is_open,
            categories      : business.categories
        })

对于企业而言,我们不能只执行SET b = business,因为JSON具有嵌套映射。因此,您可能想要决定是否需要它们,并且可能需要沿着不同的路线前进。

最后,评论,这是我们加入的所有内容。

CALL apoc.load.json('file:///c://temp//SO//review.json') YIELD value AS review
CREATE (r:Review)
SET r = review
WITH r
//Match user to a review
MATCH (u:User {user_id: r.user_id})
CREATE (u)-[:HAS_REVIEW]->(r)
WITH r, u
//Match business to a review, and a user to a business
MATCH (b:Business {business_id: r.business_id})
//Merge here in case of multiple reviews
MERGE (u)-[:HAS_REVIEWED]->(b)
CREATE (b)-[:HAS_REVIEW]->(r)

显然 - 将标签/关系更改为您想要的类型,并且可能需要根据数据大小等进行调整,因此您可能需要使用apoc.periodic.iterate来处理它。

如果你需要,Apoc是here(你应该使用它!)