我有几个数据框,它们有几个月作为列,并包含整数值。我为这个例子发帖2。
df1 =
June 2016 July 2016
Flavor
Vanilla 17.0 23.0
Chocolate 7.0 12.0
Strawberry 11.0 14.0
df2 =
June 2016 July 2016
Flavor
Vanilla 9.0 19.0
Chocolate 10.0 3.0
如何在每个数据帧必须匹配的情况下迭代每个数据帧并根据数据帧的行名和列名执行计算?例如,我想计算7月香草的平均值,即(23 + 19)/ 2。如果数据框中也不存在Flavor
,那么我还想在该数据框中每月分配一个常量值(在本例中为15)。我会将数据框附加在一起然后应用.mean()
吗?
提前致谢,对任何突然事件表示抱歉,我目前正在旅途中旅行。
谢谢!
答案 0 :(得分:2)
将groupby
与列
pd.concat([df1,df2],1).fillna(15).groupby(level=0,axis=1).mean()
Out[408]:
July2016 June2016
Chocolate 7.5 8.5
Strawberry 14.5 13.0
Vanilla 21.0 13.0
答案 1 :(得分:0)
考虑直接进行向量化,因为您可以跨类似的结构化数据帧运行算术运算:
(df1 + df2.reindex(labels=df1.index.values, fill_value=15)) / 2
# June 2016 July 2016
# Flavor
# Vanilla 13.0 21.0
# Chocolate 8.5 7.5
# Strawberry 13.0 14.5
对于列表中的许多数据框,请考虑reduce
:
from functools import reduce
df_list = [df1, df2]
new_df_list = [d.reindex(labels=df1.index.values, fill_value=15) for d in df_list]
reduce(lambda x,y: x + y, new_df_list) / len(new_df_list)
# June 2016 July 2016
# Flavor
# Vanilla 13.0 21.0
# Chocolate 8.5 7.5
# Strawberry 13.0 14.5
数据
import pandas as pd
from io import StringIO
txt = '''
Flavor "June 2016" "July 2016"
Vanilla 17.0 23.0
Chocolate 7.0 12.0
Strawberry 11.0 14.0'''
df1 = pd.read_table(StringIO(txt), sep="\s+", index_col=0)
txt = '''
Flavor "June 2016" "July 2016"
Vanilla 9.0 19.0
Chocolate 10.0 3.0'''
df2 = pd.read_table(StringIO(txt), sep="\s+", index_col=0)