我有一个像这样的数据框 df3
列长度为 AAA _ ??? 的未知列可以是数据集中的任何内容
Date ID Calendar_Year Month DayName... AAA_1E AAA_BMITH AAA_4.1 AAA_CH
0 2019-09-17 8661 2019 Sep Sun... NaN NaN NaN NaN
1 2019-09-18 8662 2019 Sep Sun... 1.0 3.0 34.0 1.0
2 2019-09-19 8663 2019 Sep Sun... NaN NaN NaN NaN
3 2019-09-20 8664 2019 Sep Mon... NaN NaN NaN NaN
4 2019-09-20 8664 2019 Sep Mon... 2.0 4.0 32.0 3.0
5 2019-09-20 8664 2019 Sep Sat... NaN NaN NaN NaN
6 2019-09-20 8664 2019 Sep Sat... NaN NaN NaN NaN
7 2019-09-20 8664 2019 Sep Sat... 0.0 4.0 30.0 0.0
另一个数据框 dfMeans ,其平均值为第三个数据框
Month Dayname ID ... AAA_BMITH AAA_4.1 AAA_CH
0 Jan Thu 7686.500000 ... 0.000000 28.045455 0.0
1 Jan Fri 7636.272727 ... 0.000000 28.136364 0.0
2 Jan Sat 7637.272727 ... 0.000000 27.045455 0.0
3 Jan Sun 7670.090909 ... 0.000000 27.090909 0.0
4 Jan Mon 7702.909091 ... 0.000000 27.727273 0.0
5 Jan Tue 7734.260870 ... 0.000000 27.956522 0.0
数据帧将由月份和日名
我想用dfMean中的值替换df3中的NaN
使用此行
df3.update(dfMeans, overwrite=False, errors="raise")
但是我得到这个错误
引发ValueError(“数据重叠。”)
ValueError:数据重叠。
如何使用dfMean中的值更新NaN并避免此错误?
编辑:
我已将所有数据框放在一个数据框 df
中 Month Dayname ID ... AAA_BMITH AAA_4.1 AAA_CH
0 Jan Thu 7686.500000 ... 0.000000 28.045455 0.0
1 Jan Fri 7636.272727 ... 0.000000 28.136364 0.0
2 Jan Sat 7637.272727 ... 0.000000 27.045455 0.0
3 Jan Sun 7670.090909 ... 0.000000 27.090909 0.0
4 Jan Mon 7702.909091 ... 0.000000 27.727273 0.0
5 Jan Tue 7734.260870 ... 0.000000 27.956522 0.0
如何用月份和日名的平均值填充NaN?
答案 0 :(得分:2)
fillna
: Date ID Calendar_Year Month Dayname AAA_1E AAA_BMITH AAA_4.1 AAA_CH
2019-09-17 8661 2019 Jan Sun NaN NaN NaN NaN
2019-09-18 8662 2019 Jan Sun 1.0 3.0 34.0 1.0
2019-09-19 8663 2019 Jan Sun NaN NaN NaN NaN
2019-09-20 8664 2019 Jan Mon NaN NaN NaN NaN
2019-09-20 8664 2019 Jan Mon 2.0 4.0 32.0 3.0
2019-09-20 8664 2019 Jan Sat NaN NaN NaN NaN
2019-09-20 8664 2019 Jan Sat NaN NaN NaN NaN
2019-09-20 8664 2019 Jan Sat 0.0 4.0 30.0 0.0
df.set_index(['Month', 'Dayname'], inplace=True)
Month Dayname ID AAA_BMITH AAA_4.1 AAA_CH
Jan Thu 7686.500000 0.0 28.045455 0.0
Jan Fri 7636.272727 0.0 28.136364 0.0
Jan Sat 7637.272727 0.0 27.045455 0.0
Jan Sun 7670.090909 0.0 27.090909 0.0
Jan Mon 7702.909091 0.0 27.727273 0.0
Jan Tue 7734.260870 0.0 27.956522 0.0
df_mean.set_index(['Month', 'Dayname'], inplace=True)
df
:AAA_1E
不在df_mean
for col in df.columns:
if col in df_mean.columns:
df[col].fillna(df_mean[col], inplace=True)
答案 1 :(得分:1)
您可以在groupby
和'Month'
上DayName'
,然后使用apply
编辑数据框。
使用fillna来填充Nan
值。 fillna
接受字典作为value
参数:字典的键是列名,值是标量:标量用于替换每一列中的Nan
。使用loc
,您可以从dMeans
中选择适当的值。
您可以使用df3
和dfMeans
的列之间的交集,用dict理解来创建字典。
所有这些都对应以下语句:
df3filled = df3.groupby(['Month', 'DayName']).apply(lambda x : x.fillna(
{col : dfMeans.loc[(dfMeans['Month'] == x.name[0]) & (dfMeans['Dayname'] == x.name[1]), col].iloc[0]
for col in x.columns.intersection(dfMeans.columns)})).reset_index(drop=True)