我有2个数据框
第一个看起来像这样
Month DayOfWeek Class A1 A2 ... A999
July Monday Bata 7 9 ... 5
July Tuesay Bata 3 1 ... 2
July Sunday Bata 4 5 ... 6
July Monday Adid 9 8 ... 5
July Sunday Adid 4 0 ... 4
Sept Monday Nike 7 5 ... 7
Sept Sunday Nike 8 3 ... 7
Sept Satday Adid 2 7 ... 7
Sept Monday Bata 8 9 ... 4
Oct Monday Nike 4 2 ... 5
Oct Sunday Bata 8 6 ... 3
我的第二个数据帧看起来像这样
Month DayOfWeek Class A1 A2 ... A999
Jul Monday Bata 5 7 8
Oct Monday Adid 1 2 3
Sep Monday Bata 3 7 6
Sep Monday Nike 8 3 8
Jul Monday Adid NaN NaN NaN
Sep Sunday Nike NaN NaN NaN
Oct Satday Nike NaN NaN NaN
Sep Monday Bata NaN NaN NaN
第一个称为df1的数据帧没有NaN 第二个数据帧df2中几乎有一半是A1至A999列中的NaN
列数是可变的,可能是从A1到A10或从A1到A2567
它可以是任意数量的列
我想用df1中的相同月份和DayOfWeek的平均值来填充df2中的这些NaN
我之前发布了另一个问题,但是情况已经改变,它已分为2个数据框和未知的列数
到目前为止,我已经做到了
Mth = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
Wk = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
for m in Mth:
for w in Wk:
print(w,m, df[(df["Month"]==m) & (df["DayOfWeek"]==w) ].mean())
我不知道要去哪里,怎么不指定要应用于所有列的列名
Month DayOfWeek Class A1 A2 ... A999
Jul Monday Bata 5 7 8
Oct Monday Adid 1 2 3
Sep Monday Bata 3 7 6
Sep Monday Nike 8 3 8
Jul Monday Adid NaN NaN NaN <--- Avg of Monday Jul in df1 for each column
Sep Sunday Nike NaN NaN NaN <--- Avg of Sunday Sep in df1 for each column
Oct Satday Nike NaN NaN NaN <--- Avg of Satday Oct in df1 for each column
Sep Monday Bata NaN NaN NaN <--- Avg of Monday Sep in df1 for each column
该怎么做?
答案 0 :(得分:1)
我认为这可能有效:
result = pd.concat([df1, df2]).groupby(['Month','DayOfWeek','Class'], as_index=False,axis=0).mean().dropna()
输出类似于:
Month DayOfWeek Class A1 A2 A999
2 July Monday Adid 9.0 8.0 5.0
3 July Monday Bata 7.0 9.0 5.0
4 July Sunday Adid 4.0 0.0 4.0
5 July Sunday Bata 4.0 5.0 6.0
6 July Tuesday Bata 3.0 1.0 2.0
8 Oct Monday Nike 4.0 2.0 5.0
使用concat可以合并数据帧。我想您想按Month,DayOfWeek和Class分组。这段代码“ as_index = False,axis = 0”使您可以混合使用不同列大小的数据帧。 当按“月,星期几和班级”分组时,它将创建所有可能的列:
Month DayOfWeek Class A1 A2 A999
0 Jul Monday Adid NaN NaN NaN
在这种特殊情况下,没有数据,也没有印刷兴趣,解决方案是在末尾添加dropna()。
希望对您有帮助。
答案 1 :(得分:1)
您可以使用如下所示的分组,合并和更新功能
生成虚拟数据
Mth = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
Wk = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
def generate(nan=False):
values = np.random.rand(20,20)
if nan:
nan_mask = np.random.choice([False,False,True], (20,20))
values[nan_mask] = np.nan
df = pd.DataFrame(values, columns = [f"A{i}" for i in range(values.shape[1])])
df_ = pd.DataFrame()
df_["Month"] = np.random.choice(Mth,20)
df_["DayOfWeek"] = np.random.choice(Wk,20)
df = pd.concat([df_, df], sort=False, axis=1)
return df
df1 = generate()
df2 = generate(True)
解决方案 首先为每个组合计算均值,然后将均值与原始数据索引合并,然后使用均值更新原始数据
means = df1.groupby(["Month", "DayOfWeek"]).mean().reset_index()
means = df1[["Month", "DayOfWeek"]].merge(means, how="left", on=["Month", "DayOfWeek"])
display(df2)
df3=df2.copy()
df3.update(means, overwrite=False)
display(df3)