我有这样的df,
A B C D E
1 2 3 0 2
2 0 7 1 1
3 4 0 3 0
0 0 3 4 3
我正在尝试用第一行和对应列的0值行之间的mean()值替换所有0
我的预期输出是
A B C D E
1.0 2.00 3.000000 0.0 2.0
2.0 1.00 7.000000 1.0 1.0
3.0 4.00 3.333333 3.0 1.0
1.5 1.75 3.000000 4.0 3.0
答案 0 :(得分:1)
IIUC
def f(x):
for z in range(x.size):
if x[z] == 0: x[z] = np.mean(x[:z+1])
return x
df.astype(float).apply(f)
A B C D E
0 1.0 2.00 3.000000 0.0 2.0
1 2.0 1.00 7.000000 1.0 1.0
2 3.0 4.00 3.333333 3.0 1.0
3 1.5 1.75 3.000000 4.0 3.0
答案 1 :(得分:1)
这是主要问题,如果每列有多个mean
,则需要先前的0
值,因此创建向量化解决方案确实很成问题:
def f(x):
for i, v in enumerate(x):
if v == 0:
x.iloc[i] = x.iloc[:i+1].mean()
return x
df1 = df.astype(float).apply(f)
print (df1)
A B C D E
0 1.0 2.00 3.000000 0.0 2.0
1 2.0 1.00 7.000000 1.0 1.0
2 3.0 4.00 3.333333 3.0 1.0
3 1.5 1.75 3.000000 4.0 3.0
更好的解决方案:
#create indices of zero values to helper DataFrame
a, b = np.where(df.values == 0)
df1 = pd.DataFrame({'rows':a, 'cols':b})
#for first row is not necessary count means
df1 = df1[df1['rows'] != 0]
print (df1)
rows cols
1 1 1
2 2 2
3 2 4
4 3 0
5 3 1
#loop by each row of helper df and assign means
for i in df1.itertuples():
df.iloc[i.rows, i.cols] = df.iloc[:i.rows+1, i.cols].mean()
print (df)
A B C D E
0 1.0 2.00 3.000000 0 2.0
1 2.0 1.00 7.000000 1 1.0
2 3.0 4.00 3.333333 3 1.0
3 1.5 1.75 3.000000 4 3.0
另一个类似的解决方案(所有对中的mean
):
for i, j in zip(*np.where(df.values == 0)):
df.iloc[i, j] = df.iloc[:i+1, j].mean()
print (df)
A B C D E
0 1.0 2.00 3.000000 0.0 2.0
1 2.0 1.00 7.000000 1.0 1.0
2 3.0 4.00 3.333333 3.0 1.0
3 1.5 1.75 3.000000 4.0 3.0