我有以下数据框,我想按如下所述应用ffill:
数据:
print(for_stack.to_dict())
{2.0: {'A_cj8e134xu02pixvky4r70o0se': 1.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
3.0: {'A_cj8e134xu02pixvky4r70o0se': 2.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
4.0: {'A_cj8e134xu02pixvky4r70o0se': 3.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
5.0: {'A_cj8e134xu02pixvky4r70o0se': 4.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
6.0: {'A_cj8e134xu02pixvky4r70o0se': 5.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
7.0: {'A_cj8e134xu02pixvky4r70o0se': 6.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 2.0},
8.0: {'A_cj8e134xu02pixvky4r70o0se': 7.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 3.0},
9.0: {'A_cj8e134xu02pixvky4r70o0se': 8.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
10.0: {'A_cj8e134xu02pixvky4r70o0se': nan, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0}}
我只想在值为8的情况下应用ffill
,它应该会产生所需的输出(请注意,仅当填充值为8时才填充):
{2.0: {'A_cj8e134xu02pixvky4r70o0se': 1.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
3.0: {'A_cj8e134xu02pixvky4r70o0se': 2.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
4.0: {'A_cj8e134xu02pixvky4r70o0se': 3.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
5.0: {'A_cj8e134xu02pixvky4r70o0se': 4.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
6.0: {'A_cj8e134xu02pixvky4r70o0se': 5.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
7.0: {'A_cj8e134xu02pixvky4r70o0se': 6.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 2.0},
8.0: {'A_cj8e134xu02pixvky4r70o0se': 7.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 3.0},
9.0: {'A_cj8e134xu02pixvky4r70o0se': 8.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
10.0: {'A_cj8e134xu02pixvky4r70o0se': 8.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0}}
任何帮助都会很棒!
答案 0 :(得分:2)
因此,基本上,如果仅前一个值为nan
,您想用8
填充8
:
df[df.shift().eq(8) & df.isnull()] = 8
我错过了ffill
部分。试试这个幼稚的循环:
for col in df.columns:
filters = df[col].eq(8) | df[col].isnull()
df.loc[filters,col] = df.loc[filters,col].ffill()
编辑2:今天早上急忙离开,没有仔细检查。解决方法:
for col in df.columns:
# mark all na blocks with their previous row
filters = (~df[col].isna()).cumsum()
# record those nan blocks with starting 8
eq8 = filters[df[col].eq(8)]
# filter these block
filters = filters.isin(eq8)
# fill these block with 8
df.loc[filters, col] = 8
答案 1 :(得分:1)
这远非理想,还有一个有趣的问题,就是为什么函数cond_fill
仅适用于一列的数据帧。添加一秒钟,它将不被应用。
import pandas as pd
import numpy as np
print(pd.__version__)
df = pd.DataFrame(np.random.choice([1,np.nan,8], size=(10,1)), columns=['a'])
#df = pd.DataFrame(np.random.choice([1,np.nan,8], size=(10,2)), columns=['a', 'b'])
cols = df.columns
def cond_fill(s):
fill = False
for i,x in s.iteritems():
# set a '9' so we can see the change
if pd.isnull(x) and fill: s.loc[i] = 9
else: fill = False
if x == 8: fill = True
return x
df.apply(cond_fill)
print(df)
生产
0.24.2
a
0 NaN
1 1.0
2 NaN
3 NaN
4 8.0
5 9.0
6 1.0
7 NaN
8 8.0
9 9.0
答案 2 :(得分:1)
这是一种完全不同的方法,适用于n列并且速度很快。
string
会产生如下结果:
import pandas as pd
import numpy as np
print(pd.__version__)
df = pd.DataFrame(np.random.choice([1,np.nan,8], size=(10,2)), columns=['a', 'b'])
print(df)
for col in df.columns:
new_col_1 = "{}_1".format(col)
df[new_col_1] = df[col].fillna(8)
new_col_2 = "{}_2".format(col)
df[new_col_2] = df[col].ffill()
df[col] = df[col].ffill()
df[col][df[new_col_1] != df[new_col_2]] = np.nan
df.drop([new_col_1, new_col_2], axis=1, inplace=True)
print(df)