我有一个熊猫DF列表,每个DF具有相同的columns
:
df1_values = [["2001-01-01","Lime",10],["2001-01-02","Lime",20]]
df2_values = [["2001-01-01","Mango",40],["2001-01-02","Mango",50],["2001-01-03","Mango",60]]
df3_values = [["2001-01-01","Orange",30]]
df1 = pd.DataFrame(df1_values,columns=["date","fruit","value"])
df2 = pd.DataFrame(df2_values,columns=["date","fruit","value"])
df3 = pd.DataFrame(df3_values,columns=["date","fruit","value"])
dfs = [df1,df2,df3]
示例DF之一-> DF1:
date fruit value
0 2001-01-01 Lime 10
1 2001-01-02 Lime 20
尝试按以下格式(按日期分组)merge
list
中的所有DF,期望操作:
date fruit value
2001-01-01 Lime 10
2001-01-01 Mango 40
2001-01-01 Orange 30
2001-01-02 Lime 20
2001-01-02 Mango 50
2001-01-03 Mango 60
当前的迭代方法:
date_dict={}
for each_date in ["2001-01-01","2001-01-02","2001-01-03"]:
for each_df in dfs:
if each_date in date_dict:
#append the values for this date
else:
#enter the values for this date
它正在工作,但是需要很长时间。
熊猫方法:
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['fruit'],
how='outer'), dfs)
OP:
date_x fruit value_x date_y value_y date value
0 2001-01-01 Lime 10.0 NaN NaN NaN NaN
1 2001-01-02 Lime 20.0 NaN NaN NaN NaN
2 NaN Mango NaN 2001-01-01 40.0 NaN NaN
3 NaN Mango NaN 2001-01-02 50.0 NaN NaN
4 NaN Mango NaN 2001-01-03 60.0 NaN NaN
5 NaN Orange NaN NaN NaN 2001-01-01 30.0
关于如何纠正错误的任何建议都可能会有所帮助。
答案 0 :(得分:3)
您可以先进行pandas.concat
,然后进行.sort_values
:
print( pd.concat(dfs).sort_values('date') )
打印:
date fruit value
0 2001-01-01 Lime 10
0 2001-01-01 Mango 40
0 2001-01-01 Orange 30
1 2001-01-02 Lime 20
1 2001-01-02 Mango 50
2 2001-01-03 Mango 60