Question

我在熊猫数据框下方。

现在，我希望获得与条件df.a> df.b不匹配的最后一个子集。意味着如果我们为了更好地理解这一点而创建了一个新列，则它将如下所示。

   a   b     c
0  5  10  Down
1  6  12  Down
2  9   4    Up
3  8   3    Up
4  3   6  Down
5  2   7  Down
6  4   5  Down

从上面开始，我希望获得df.c的最后一个子集，其值“ Down”相同，这意味着输出将具有最后三行，如下所示。

   a   b     c
4  3   6  Down
5  2   7  Down
6  4   5  Down

我写了下面的代码。但是之后我一无所知。

import pandas as pd
import numpy as np


df = pd.DataFrame([[5, 10], [6, 12], [9, 4], [8, 3], [3, 6], [2, 7], [4, 5]], columns=["a", "b"])
df['c'] = np.where(df.a > df.b,'Up','Down')
print(df)

请帮助我。

Answer 1

由Down值填充的最后一个获取组的解决方案是Series.ne，Series.shift和Series.cumsum创建的每个连续值s的第一个获取组，然后使用Series.eq创建的掩码按Down值进行过滤，获取最后一组的最大值，最后过滤器获得原始过滤器s的值，最后掩码为boolean indexing的最后过滤器：< / p>

s = df['c'].ne(df['c'].shift()).cumsum()
m = df['c'].eq('Down')
df = df[s.eq(s[m].max())]
print (df)
   a  b     c
4  3  6  Down
5  2  7  Down
6  4  5  Down

详细信息：

print (s)
0    1
1    1
2    2
3    2
4    3
5    3
6    3
Name: c, dtype: int32

print (m)
0     True
1     True
2    False
3    False
4     True
5     True
6     True
Name: c, dtype: bool

print (s[m])
0    1
1    1
4    3
5    3
6    3
Name: c, dtype: int32

print (s[m].max())
3

print (s.eq(s[m].max()))
0    False
1    False
2    False
3    False
4     True
5     True
6     True
Name: c, dtype: bool

Answer 2

这里是more_itertools.consecutive_groups中的一个：

from more_itertools import consecutive_groups
m = df[df['c'].eq('Down')]
df.loc[[list(i) for i in consecutive_groups(m.index)][-1]] #-1 takes the last group

   a  b     c
4  3  6  Down
5  2  7  Down
6  4  5  Down

位置：

[list(i) for i in consecutive_groups(m.index)]

输出：

[[0, 1], [4, 5, 6]]

基于组的熊猫子集

2 个答案: