我有两列数据餐厅名称和评论者成绩:
name grades
0 Honey'S Thai Pavilion [{u'date': 2014-08-12 00:00:00, u'grade'..
1 Siam Sqaure Thai Cuisine [{u'date': 2014-11-06 00:00:00, u'grade'...
问题是一列是JSON中多个'日期','等级'和'得分'配对的列表(技术上很好BSON,因为这是MongoDB教程中的示例数据集)。我需要打破成绩列,以便得到如下结果数据框:
name Date Grade Score
Honey'S Thai Pavilion 2014-08-12 00:00:00 A 6
Honey'S Thai Pavilion 2015-03-14 00:00:00 B 5
Honey'S Thai Pavilion 2013-07-15 00:00:00 C 6
Siam Sqaure Thai Cuisine 2014-11-06 00:00:00 A 3
Siam Sqaure Thai Cuisine 2015-06-06 00:00:00 B 2
所以我需要拆分一列但保留餐馆名称。下面的代码实现了将成绩列变为漂亮的数据框,但我无法弄清楚如何保留餐馆名称。
from pymongo import MongoClient
import pymongo
import pandas as pd
client = MongoClient()
db = client.test
)
cursor2 = db.restaurants.find().sort([
("borough", pymongo.ASCENDING),
("cuisine", pymongo.DESCENDING)
])
#cursor.sort("cuisine",pymongo.ASCENDING)
data = pd.DataFrame(list(cursor2))[['name', 'grades']]
data_list= []
for i in range(0, len(data.grades)):
g_data = pd.DataFrame(data.grades[i])
data_list.append(g_data)
result = pd.concat(data_list)
print result.head(100)
答案 0 :(得分:1)
不太了解熊猫,但你可以使用生成器表达式将mongo游标的结果展平,然后将生成器提供给pandas数据框,如下所示:
flattened_data = (
{
'name': record['name'],
'date': grade['date'],
'grade': grade['grade'],
'score': grade.get('score')
}
for record in cursor2
for grade in record['grades']
)
result = pd.DataFrame(flattened_data)[['name', 'date', 'grade', 'score']]
print result.head(100)
这样,您就不需要在data_list
循环上构建for
列表。