我有这样的数据框:
Name Nationality Tall Age
John USA 190 24
Thomas French 194 25
Anton Malaysia 180 23
Chris Argentina 190 26
所以说我有这样的传入数据结构。每个元素代表每行的数据。 :
data = [{
'food':{'lunch':'Apple',
'breakfast':'Milk',
'dinner':'Meatball'},
'drink':{'favourite':'coke',
'dislike':'juice'}
},
..//and 3 other records
].
'数据'是一些可以节省机器学习预测食物和饮料的变量。有更多的记录(约400k行),但我按批量处理它们(现在我通过迭代处理2k数据)。预期结果如:
Name Nationality Tall Age Lunch Breakfast Dinner Favourite Dislike
John USA 190 24 Apple Milk Meatball Coke Juice
Thomas French 194 25 ....
Anton Malaysia 180 23 ....
Chris Argentina 190 26 ....
有没有一种有效的方法来实现数据帧?到目前为止,我已经尝试迭代数据变量并获得每个预测标签的值。感觉这个过程花了很多时间。
答案 0 :(得分:0)
首先需要flatenning dictionaries,创建DataFrame
并加入原始版本:
data = [{
'a':{'lunch':'Apple',
'breakfast':'Milk',
'dinner':'Meatball'},
'b':{'favourite':'coke',
'dislike':'juice'}
},
{
'a':{'lunch':'Apple1',
'breakfast':'Milk1',
'dinner':'Meatball2'},
'b':{'favourite':'coke2',
'dislike':'juice3'}
},
{
'a':{'lunch':'Apple4',
'breakfast':'Milk5',
'dinner':'Meatball4'},
'b':{'favourite':'coke2',
'dislike':'juice4'}
},
{
'a':{'lunch':'Apple3',
'breakfast':'Milk8',
'dinner':'Meatball7'},
'b':{'favourite':'coke4',
'dislike':'juice1'}
}
]
#or use another solutions, both are nice
L = [{k: v for x in d.values() for k, v in x.items()} for d in data]
df1 = pd.DataFrame(L)
print (df1)
breakfast dinner dislike favourite lunch
0 Milk Meatball juice coke Apple
1 Milk1 Meatball2 juice3 coke2 Apple1
2 Milk5 Meatball4 juice4 coke2 Apple4
3 Milk8 Meatball7 juice1 coke4 Apple3
df2 = df.join(df1)
print (df2)
Name Nationality Tall Age breakfast dinner dislike favourite \
0 John USA 190 24 Milk Meatball juice coke
1 Thomas French 194 25 Milk1 Meatball2 juice3 coke2
2 Anton Malaysia 180 23 Milk5 Meatball4 juice4 coke2
3 Chris Argentina 190 26 Milk8 Meatball7 juice1 coke4
lunch
0 Apple
1 Apple1
2 Apple4
3 Apple3