Question

我想使用布尔掩码索引Pandas数据帧，然后根据整数索引在过滤后的数据帧的子集中设置一个值，并将此值反映在数据帧中。也就是说，如果这对数据框架有所了解，我会很高兴。

示例：

In [293]:

df = pd.DataFrame({'a': [0, 1, 2, 3, 4, 5, 6, 7],
                   'b': [5, 5, 2, 2, 5, 5, 2, 2],
                   'c': [0, 0, 0, 0, 0, 0, 0, 0]})

mask = (df['a'] < 7) & (df['b'] == 2)
df.loc[mask, 'c']

Out[293]:
2    0
3    0
6    0
Name: c, dtype: int64

现在我想设置过滤后的数据帧中返回的前两个元素的值。将iloc链接到上面的loc调用上可用于索引：

In [294]:

df.loc[mask, 'c'].iloc[0: 2]

Out[294]:

2    0
3    0
Name: c, dtype: int64

但不要分配：

In [295]:

df.loc[mask, 'c'].iloc[0: 2] = 1

print(df)

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  0
3  3  2  0
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

使赋值与切片的长度相同（即= [1, 1]）也不起作用。有没有办法分配这些值？

Answer 1

这确实有效，但有点难看，基本上我们使用从掩码生成的索引并对loc进行额外调用：

In [57]:

df.loc[df.loc[mask,'c'].iloc[0:2].index, 'c'] = 1
df
Out[57]:
   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

所以打破了上面的内容：

In [60]:
# take the index from the mask and iloc
df.loc[mask, 'c'].iloc[0: 2]
Out[60]:
2    0
3    0
Name: c, dtype: int64
In [61]:
# call loc using this index, we can now use this to select column 'c' and set the value
df.loc[df.loc[mask,'c'].iloc[0:2].index]
Out[61]:
   a  b  c
2  2  2  0
3  3  2  0

Answer 2

怎么样。

ix = df.index[mask][:2]
df.loc[ix, 'c'] = 1

与EdChum相同的想法，但在评论中建议更优雅。

编辑：必须要小心这个，因为它可能会使用非唯一索引给出不需要的结果，因为可能有多行被上面ix中的任何一个标签索引。如果索引是非唯一的并且您只想要满足布尔键的前2（或n）行，那么使用带有整数索引的.iloc会更安全

ix = np.where(mask)[0][:2]
df.iloc[ix, 'c'] = 1

Answer 3

我不知道这是否更优雅，但它有点不同：

mask = mask & (mask.cumsum() < 3)

df.loc[mask, 'c'] = 1

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

Pandas通过布尔“loc”和后续的“iloc”进行索引

3 个答案: