JSON列是嵌套值

时间:2015-09-23 17:38:03

标签: python json mongodb pandas pymongo

我有两列数据餐厅名称和评论者成绩:

   name                            grades
0  Honey'S Thai Pavilion           [{u'date': 2014-08-12 00:00:00, u'grade'..  
1  Siam Sqaure Thai Cuisine        [{u'date': 2014-11-06 00:00:00, u'grade'...

问题是一列是JSON中多个'日期','等级'和'得分'配对的列表(技术上很好BSON,因为这是MongoDB教程中的示例数据集)。我需要打破成绩列,以便得到如下结果数据框:

name                       Date                   Grade         Score
Honey'S Thai Pavilion      2014-08-12 00:00:00    A             6
Honey'S Thai Pavilion      2015-03-14 00:00:00    B             5
Honey'S Thai Pavilion      2013-07-15 00:00:00    C             6
Siam Sqaure Thai Cuisine   2014-11-06 00:00:00    A             3
Siam Sqaure Thai Cuisine   2015-06-06 00:00:00    B             2

所以我需要拆分一列但保留餐馆名称。下面的代码实现了将成绩列变为漂亮的数据框,但我无法弄清楚如何保留餐馆名称。

    from pymongo import MongoClient
    import pymongo
    import pandas as pd

    client = MongoClient()

    db = client.test

)
    cursor2 = db.restaurants.find().sort([
        ("borough", pymongo.ASCENDING),
        ("cuisine", pymongo.DESCENDING)
    ])

    #cursor.sort("cuisine",pymongo.ASCENDING)
    data = pd.DataFrame(list(cursor2))[['name', 'grades']]

    data_list= []
    for i in range(0, len(data.grades)):
        g_data = pd.DataFrame(data.grades[i])
        data_list.append(g_data)

    result = pd.concat(data_list)
    print result.head(100)

1 个答案:

答案 0 :(得分:1)

不太了解熊猫,但你可以使用生成器表达式将mongo游标的结果展平,然后将生成器提供给pandas数据框,如下所示:

flattened_data = (
    {
        'name': record['name'],
        'date': grade['date'],
        'grade': grade['grade'],
        'score': grade.get('score')
    }
    for record in cursor2
    for grade in record['grades']
)
result = pd.DataFrame(flattened_data)[['name', 'date', 'grade', 'score']]
print result.head(100)

这样,您就不需要在data_list循环上构建for列表。