Question

我有这样的数据框：

Name    Nationality    Tall    Age
John    USA            190     24
Thomas  French         194     25
Anton   Malaysia       180     23
Chris   Argentina      190     26

所以说我有这样的传入数据结构。每个元素代表每行的数据。：

data = [{
         'food':{'lunch':'Apple',
                'breakfast':'Milk',
                'dinner':'Meatball'},
         'drink':{'favourite':'coke',
                   'dislike':'juice'}
         },
         ..//and 3 other records
       ].

＆＃39;数据＆＃39;是一些可以节省机器学习预测食物和饮料的变量。有更多的记录（约400k行），但我按批量处理它们（现在我通过迭代处理2k数据）。预期结果如：

Name    Nationality    Tall    Age Lunch Breakfast Dinner   Favourite Dislike
John    USA            190     24  Apple Milk      Meatball Coke      Juice
Thomas  French         194     25  ....
Anton   Malaysia       180     23  ....
Chris   Argentina      190     26  ....

有没有一种有效的方法来实现数据帧？到目前为止，我已经尝试迭代数据变量并获得每个预测标签的值。感觉这个过程花了很多时间。

Answer 1

首先需要flatenning dictionaries，创建DataFrame并加入原始版本：

data = [{
         'a':{'lunch':'Apple',
                'breakfast':'Milk',
                'dinner':'Meatball'},
         'b':{'favourite':'coke',
              'dislike':'juice'}
         },
         {
         'a':{'lunch':'Apple1',
                'breakfast':'Milk1',
                'dinner':'Meatball2'},
         'b':{'favourite':'coke2',
              'dislike':'juice3'}
         },

{
         'a':{'lunch':'Apple4',
                'breakfast':'Milk5',
                'dinner':'Meatball4'},
         'b':{'favourite':'coke2',
              'dislike':'juice4'}
         },
         {
         'a':{'lunch':'Apple3',
                'breakfast':'Milk8',
                'dinner':'Meatball7'},
         'b':{'favourite':'coke4',
              'dislike':'juice1'}
         }
]

#or use another solutions, both are nice
L = [{k: v for x in d.values() for k, v in x.items()} for d in data]

df1 = pd.DataFrame(L)
print (df1)
  breakfast     dinner dislike favourite   lunch
0      Milk   Meatball   juice      coke   Apple
1     Milk1  Meatball2  juice3     coke2  Apple1
2     Milk5  Meatball4  juice4     coke2  Apple4
3     Milk8  Meatball7  juice1     coke4  Apple3

df2 = df.join(df1)
print (df2)
     Name Nationality  Tall  Age breakfast     dinner dislike favourite  \
0    John         USA   190   24      Milk   Meatball   juice      coke   
1  Thomas      French   194   25     Milk1  Meatball2  juice3     coke2   
2   Anton    Malaysia   180   23     Milk5  Meatball4  juice4     coke2   
3   Chris   Argentina   190   26     Milk8  Meatball7  juice1     coke4   

    lunch  
0   Apple  
1  Apple1  
2  Apple4  
3  Apple3

将字典列表存储到csv

1 个答案: