Question

对于每个组，如何在特定列的第一个空值出现之前对行进行子集或选择？

示例

并仅选择

Answer 1

将groupby与使用pd.Series.isnull的自定义功能一起使用：

def index_filter(x):
    nulls = x.isnull()
    n = nulls[nulls].index[0] - 1
    return x.loc[:n]

res = df.groupby('id')['sales']\
        .apply(index_filter).astype(int)\
        .reset_index().drop('level_1', axis=1)

或者，您可以将生成器表达式与next一起使用：

def index_filter(x):
    n = next((i for i, j in enumerate(x) if np.isnan(j)), len(x))
    return x.iloc[:n]

结果：

print(res)

   id  sales
0  12      1
1  12      3
2  15      4
3  15      6
4  15      9

Answer 2

一种方法是获取空值，对ID进行分组，获取总和，以使第一个空值之前的所有行的计数为0，而第一个空值之后的行的计数为1或更多，然后选择这些行。爱荷华州：

In [19]: df.loc[df["sales"].isnull().groupby(df["id"]).cumsum() < 1]
Out[19]: 
   id  sales
0  12    1.0
1  12    3.0
4  15    4.0
5  15    6.0
6  15    9.0

在每个组中选择空值之前的行

2 个答案: