假设我有两个pandas数据帧:
In [1]: dates = pd.date_range('20170101',periods=6)
df1 = pd.DataFrame(np.empty([len(dates),2]),index=dates,columns=['foo','bar'])
df1['foo'].loc[0:2] = 'A'
df1['bar'].loc[0:3] = 'A'
df1['foo'].loc[2:6] = 'B'
df1['bar'].loc[3:6] = 'B'
df2 = pd.DataFrame(np.random.randint(10,size=(6,2)),index=dates,columns=df1.columns)
print(df1)
print(df2)
Out [1]:
foo bar
2017-01-01 A A
2017-01-02 A A
2017-01-03 B A
2017-01-04 B B
2017-01-05 B B
2017-01-06 B B
foo bar
2017-01-01 5 3
2017-01-02 6 9
2017-01-03 5 9
2017-01-04 7 5
2017-01-05 0 2
2017-01-06 0 0
我有兴趣创建一个基于df1填充最大df2的第三个df。例如,输出看起来像这样:
foo bar
2017-01-01 6 9
2017-01-02 6 9
2017-01-03 7 9
2017-01-04 7 5
2017-01-05 7 5
2017-01-06 7 5
当然有一种简洁的方法可以做到这一点,对吗?
答案 0 :(得分:1)
一个选项是连接两个数据帧并为每个数据帧分配一个键,将结果数据帧转换为长格式,然后计算按键和列名称分组的最大值:
(pd.concat([df1, df2], keys=["one", "two"], axis=1)
.stack(level=1).groupby(level=1)
.apply(lambda g: g.groupby("one",as_index=False)["two"].transform("max"))
.two.unstack(level=1))
答案 1 :(得分:0)
您可以将[
{"id"=>107786, "key"=>"ABC-2002", "hidden"=>true, "done"=>false},
{"id"=>101501, "key"=>"ABC-2002", "hidden"=>true, "done"=>false},
{"id"=>107786, "key"=>"ABC-2002", "hidden"=>true, "done"=>false},
{"id"=>107784, "key"=>"ABC-2453", "hidden"=>true, "done"=>false},
{"id"=>107786, "key"=>"ABC-1345", "hidden"=>true, "done"=>false}
]
的列值添加到df1
,以获取df1
groupby
考虑非字符串
df2.stack().groupby(
df1.add(df1.columns.to_series()).stack()
).transform('max').unstack()
foo bar
2017-01-01 6 9
2017-01-02 6 9
2017-01-03 7 9
2017-01-04 7 5
2017-01-05 7 5
2017-01-06 7 5