熊猫根据分组列值合并DF的列表

时间:2020-10-15 11:14:21

标签: python pandas numpy dataframe

我有一个熊猫DF列表,每个DF具有相同的columns

df1_values = [["2001-01-01","Lime",10],["2001-01-02","Lime",20]]
df2_values = [["2001-01-01","Mango",40],["2001-01-02","Mango",50],["2001-01-03","Mango",60]]
df3_values = [["2001-01-01","Orange",30]]
df1 = pd.DataFrame(df1_values,columns=["date","fruit","value"])
df2 = pd.DataFrame(df2_values,columns=["date","fruit","value"])
df3 = pd.DataFrame(df3_values,columns=["date","fruit","value"])
dfs = [df1,df2,df3]

示例DF之一-> DF1:

      date     fruit    value
0   2001-01-01  Lime    10
1   2001-01-02  Lime    20

尝试按以下格式(按日期分组)merge list中的所有DF,期望操作:

    date         fruit  value
  2001-01-01     Lime    10
  2001-01-01     Mango   40
  2001-01-01     Orange  30
  2001-01-02     Lime    20
  2001-01-02     Mango   50
  2001-01-03     Mango   60

当前的迭代方法:

date_dict={}
for each_date in ["2001-01-01","2001-01-02","2001-01-03"]:
   for each_df in dfs:
       if each_date in date_dict:
        #append the values for this date
       else:
           #enter the values for this date

它正在工作,但是需要很长时间。

熊猫方法:

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['fruit'],
                                        how='outer'), dfs)

OP:

     date_x    fruit    value_x date_y     value_y  date    value
0   2001-01-01  Lime    10.0    NaN         NaN     NaN     NaN
1   2001-01-02  Lime    20.0    NaN         NaN     NaN     NaN
2   NaN         Mango   NaN    2001-01-01   40.0    NaN     NaN
3   NaN         Mango   NaN    2001-01-02   50.0    NaN     NaN
4   NaN         Mango   NaN    2001-01-03   60.0    NaN     NaN
5   NaN         Orange  NaN     NaN         NaN  2001-01-01 30.0

关于如何纠正错误的任何建议都可能会有所帮助。

1 个答案:

答案 0 :(得分:3)

您可以先进行pandas.concat,然后进行.sort_values

print( pd.concat(dfs).sort_values('date') )

打印:

         date   fruit  value
0  2001-01-01    Lime     10
0  2001-01-01   Mango     40
0  2001-01-01  Orange     30
1  2001-01-02    Lime     20
1  2001-01-02   Mango     50
2  2001-01-03   Mango     60