假设我有一个看起来像这样的DataFrame:
Categories Values
0 Category 0 1
1 Category 0 0
2 Category 0 -1
3 Category 0 0
4 Category 1 1
5 Category 1 0
6 Category 1 -1
7 Category 1 0
8 Category 2 1
9 Category 2 0
10 Category 2 -1
11 Category 2 0
12 Category 3 -1
13 Category 3 0
14 Category 3 0
15 Category 3 1
16 Category 4 -1
17 Category 4 0
18 Category 4 0
19 Category 4 1
20 Category 5 -1
21 Category 5 0
22 Category 5 0
23 Category 5 1
我想要一种省时的方法来获取每个组的值的最后一个非零条目的两件事:
(1):索引,
(2):条目
(1)的期望输出为:[2,6,10,15,19,23]以大熊猫系列的形式
(2)的期望输出为:[-1,-1,-1,1,1,1,1]以熊猫系列的形式
先谢谢大家
编辑:添加了用于生成上述DataFrame的python代码:
import pandas as pd
n = 4
m = 3
df = pd.DataFrame({'Categories': [f'Category {i//n}' for i in range(2*m*n)],
'Values' : [1,0,-1,0]*m+ [-1,0,0,1]*m})
答案 0 :(得分:2)
使用boolean indexing
来过滤0
列中DataFrame.drop_duplicates
的不相等的Categories
值,并仅保留最后一个重复项:
df1 = df[df['Values'].ne(0)].drop_duplicates('Categories', 'last')
print (df1)
Categories Values
2 Category 0 -1
6 Category 1 -1
10 Category 2 -1
15 Category 3 1
19 Category 4 1
23 Category 5 1
print (df1.index.tolist())
[2, 6, 10, 15, 19, 23]
print (df1['Values'].tolist())
[-1, -1, -1, 1, 1, 1]
答案 1 :(得分:0)
一种解决方法,
df['value']=df.groupby('Categories')['Values'].transform(lambda x: x.loc[x[::-1].ne(0).argmax()])
df['index']=df.groupby('Categories')['Values'].transform(lambda x: x[::-1].ne(0).argmax())
注意:可能不是解决此问题的有效方法,但是我为您尝试了此简单的解决方案。
O / P:
Categories Values value index
0 Category 0 1 -1 2
1 Category 0 0 -1 2
2 Category 0 -1 -1 2
3 Category 0 0 -1 2
4 Category 1 1 -1 6
5 Category 1 0 -1 6
6 Category 1 -1 -1 6
7 Category 1 0 -1 6
8 Category 2 1 -1 10
9 Category 2 0 -1 10
10 Category 2 -1 -1 10
11 Category 2 0 -1 10
12 Category 3 -1 1 15
13 Category 3 0 1 15
14 Category 3 0 1 15
15 Category 3 1 1 15
16 Category 4 -1 1 19
17 Category 4 0 1 19
18 Category 4 0 1 19
19 Category 4 1 1 19
20 Category 5 -1 1 23
21 Category 5 0 1 23
22 Category 5 0 1 23
23 Category 5 1 1 23
答案 2 :(得分:0)
我首先过滤非零行,即groupby:
In [11]: df1 = df[df.Values != 0]
In [12]: df1[df1.groupby("Categories")["Values"].transform(lambda x: x == x.iloc[-1])]
Out[12]:
Categories Values
2 Category 0 -1
6 Category 1 -1
10 Category 2 -1
15 Category 3 1
19 Category 4 1
23 Category 5 1
In [13]: df1[df1.groupby("Categories")["Values"].transform(lambda x: x == x.iloc[-1])].index
Out[13]: Int64Index([2, 6, 10, 15, 19, 23], dtype='int64')