如何在pandas中连接数据帧?

时间:2015-12-04 15:07:30

标签: python json mongodb pandas

我通过pymongo从mongoDB获取数据到python,然后将其转换为pandas dataframe

df = pd.DataFrame(list(db.dataset2.find()))

这就是mongoDB中数据的样子。

"dish" : [
      {
        "dish_id"          : "005" ,
        "dish_name"        : "Sandwitch",
        "dish_price"       : 50,
        "coupon_applied"   : "Yes",              
        "coupon_type"      : "Rs 20 off"
      },
      {
        "dish_id"          : "006" ,
        "dish_name"        : "Chicken Hundi",
        "dish_price"       : 125,
        "coupon_applied"   : "No",
        "coupon_type"      : "Null"

      }
   ],

我想在pandas dataframe中将dish属性分成两行。这是执行此操作的代码。 (有3个菜文件)所以,我正在迭代它循环。

for i in range(0,len(df.dish)):
data_dish = json_normalize(df['dish'][i])
print data_dish

但它让我低于输出..

 coupon_applied   coupon_type   dish_id     dish_name       dish_price  
0            Yes   Rs 20 off     001     Chicken Biryani         120   
1             No        Null     001      Paneer Biryani         100   

coupon_applied  coupon_type     dish_id   dish_name        dish_price  
0        Yes       Rs 40 off     002     Mutton Biryani      130   
1        No          Null        004      Aaloo tikki         95   


coupon_applied   coupon_type    dish_id   dish_name        dish_price 
0     Yes         Rs 20 off      005      Sandwitch           50   
1     No             Null        006     Chicken Hundi        125   

我希望以下列格式输出..

  coupon_applied   coupon_type   dish_id     dish_name       dish_price  
0     Yes          Rs 20 off      001     Chicken Biryani      120   
1     No             Null         001      Paneer Biryani      100   
2     Yes          Rs 40 off      002     Mutton Biryani       130   
3     No             Null         004      Aaloo tikki         95   
4     Yes         Rs 20 off       005      Sandwitch           50   
5     No             Null         006     Chicken Hundi        125   

你能帮我解决这个问题吗?提前谢谢:)

2 个答案:

答案 0 :(得分:2)

dishes = [json_normalize(d) for d in df['dish']]
df = pd.concat(dishes, ignore_index=True)

答案 1 :(得分:0)

您应该能够在列表中获取数据框列表,然后将它们连接起来。

对新数据框进行初始化:

df = pd.DataFrame()

创建一个空的Dataframe列表:

dflist = []

循环并附加数据帧

for i in range(0,len(df.dish)):
    data_dish = json_normalize(df['dish'][i])
    dflist.append(data_dish)

然后将列表连接到完整的数据帧:

df = pd.concat(dflist, ignore_index=True)