Question

我有以下数据框，我想按如下所述应用ffill：

数据：

print(for_stack.to_dict())
{2.0: {'A_cj8e134xu02pixvky4r70o0se': 1.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
 3.0: {'A_cj8e134xu02pixvky4r70o0se': 2.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
 4.0: {'A_cj8e134xu02pixvky4r70o0se': 3.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
 5.0: {'A_cj8e134xu02pixvky4r70o0se': 4.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
 6.0: {'A_cj8e134xu02pixvky4r70o0se': 5.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
 7.0: {'A_cj8e134xu02pixvky4r70o0se': 6.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 2.0},
 8.0: {'A_cj8e134xu02pixvky4r70o0se': 7.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 3.0},
 9.0: {'A_cj8e134xu02pixvky4r70o0se': 8.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
 10.0: {'A_cj8e134xu02pixvky4r70o0se': nan, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0}}

我只想在值为8的情况下应用ffill，它应该会产生所需的输出（请注意，仅当填充值为8时才填充）：

{2.0: {'A_cj8e134xu02pixvky4r70o0se': 1.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
 3.0: {'A_cj8e134xu02pixvky4r70o0se': 2.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
 4.0: {'A_cj8e134xu02pixvky4r70o0se': 3.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
 5.0: {'A_cj8e134xu02pixvky4r70o0se': 4.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
 6.0: {'A_cj8e134xu02pixvky4r70o0se': 5.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0},
 7.0: {'A_cj8e134xu02pixvky4r70o0se': 6.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 2.0},
 8.0: {'A_cj8e134xu02pixvky4r70o0se': 7.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 3.0},
 9.0: {'A_cj8e134xu02pixvky4r70o0se': 8.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': nan},
 10.0: {'A_cj8e134xu02pixvky4r70o0se': 8.0, 'A_cj8t63fsb04ga5bm4ongrlx6h': 1.0}}

任何帮助都会很棒！

Answer 1

因此，基本上，如果仅前一个值为nan，您想用8填充8：

df[df.shift().eq(8) & df.isnull()] = 8

我错过了ffill部分。试试这个幼稚的循环：

for col in df.columns:
    filters = df[col].eq(8) | df[col].isnull()
    df.loc[filters,col] = df.loc[filters,col].ffill()

编辑2：今天早上急忙离开，没有仔细检查。解决方法：

for col in df.columns:
    # mark all na blocks with their previous row
    filters = (~df[col].isna()).cumsum()

    # record those nan blocks with starting 8
    eq8 = filters[df[col].eq(8)]

    # filter these block
    filters = filters.isin(eq8)

    # fill these block with 8
    df.loc[filters, col] = 8

Answer 2

这远非理想，还有一个有趣的问题，就是为什么函数cond_fill仅适用于一列的数据帧。添加一秒钟，它将不被应用。

import pandas as pd
import numpy as np
print(pd.__version__)

df = pd.DataFrame(np.random.choice([1,np.nan,8], size=(10,1)), columns=['a'])
#df = pd.DataFrame(np.random.choice([1,np.nan,8], size=(10,2)), columns=['a', 'b'])

cols = df.columns

def cond_fill(s):
    fill = False
    for i,x in s.iteritems():
        # set a '9' so we can see the change
        if pd.isnull(x) and fill: s.loc[i] = 9
        else: fill = False

        if x == 8: fill = True

    return x

df.apply(cond_fill)

print(df)

生产

0.24.2
     a
0  NaN
1  1.0
2  NaN
3  NaN
4  8.0
5  9.0
6  1.0
7  NaN
8  8.0
9  9.0

Answer 3

这是一种完全不同的方法，适用于n列并且速度很快。

string

会产生如下结果：

import pandas as pd
import numpy as np
print(pd.__version__)

df = pd.DataFrame(np.random.choice([1,np.nan,8], size=(10,2)), columns=['a', 'b'])

print(df)

for col in df.columns:
    new_col_1 = "{}_1".format(col)
    df[new_col_1] = df[col].fillna(8)
    new_col_2 = "{}_2".format(col)
    df[new_col_2] = df[col].ffill()

    df[col] = df[col].ffill()
    df[col][df[new_col_1] != df[new_col_2]] = np.nan
    df.drop([new_col_1, new_col_2], axis=1, inplace=True)

print(df)

与条件熊猫一起使用fillna

3 个答案: