我有一个这样的数据框:
df_1 = pd.DataFrame({
'ID' : ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
'VAL' : ['shoes', 'flowers', 'chairs', 'apples', 'dice', 'shoes', 'apples',
'curtain', 'sand', 'socks', 'necklacs', 'tables', 'dishes', 'apples'],
'SEQ' : [0, 1, 2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 4]
})
ID VAL SEQ
0 A shoes 0
1 A flowers 1
2 A chairs 2
3 A apples 3
4 A dice 4
5 B shoes 0
6 B apples 1
7 B curtain 2
8 B sand 3
9 C socks 0
10 C necklacs 1
11 C tables 2
12 C dishes 3
13 C apples 4
我想对行进行切片,直到得到一个值,例如,对每个ID
组中的所有行进行切片,直到apple
:
Out[110]:
ID VAL SEQ
0 A shoes 0
1 A flowers 1
2 A chairs 2
3 A apples 3
4 B shoes 0
5 B apples 1
6 C socks 0
7 C necklacs 1
8 C tables 2
9 C dishes 3
10 C apples 4
答案 0 :(得分:5)
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
export PATH=$PATH:/Users/ahajibagheri/Library/Python/2.7/bin
if which pyspark > /dev/null; then
export SPARK_HOME="/usr/local/Cellar/apache-spark/2.4.3/libexec/"
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
fi
,idxmax
,groupby
concat
答案 1 :(得分:4)
GroupBy.cumsum
是你的朋友:
mask = (df_1['VAL'].eq('apples')
.shift()
.astype(float)
.groupby(df_1['ID'])
.cumsum()
.lt(1))
df_1[mask]
ID VAL SEQ
1 A flowers 1
2 A chairs 2
3 A apples 3
5 B shoes 0
6 B apples 1
9 C socks 0
10 C necklacs 1
11 C tables 2
12 C dishes 3
13 C apples 4
如果ID可能以您要查找的字词结尾,则上面的shift
解决方案(虽然方便)将是不合适的。将GroupBy.apply
与cumsum
一起使用:
mask = (df_1['VAL'].eq('apples')
.groupby(df_1['ID'])
.apply(lambda x: x.shift().fillna(0).cumsum())
.lt(1))
df_1[mask]
ID VAL SEQ
1 A flowers 1
2 A chairs 2
3 A apples 3
5 B shoes 0
6 B apples 1
9 C socks 0
10 C necklacs 1
11 C tables 2
12 C dishes 3
13 C apples 4
答案 2 :(得分:2)
我正在使用transform
df_1[df_1.index<=df_1.VAL.eq('apples').groupby(df_1['ID']).transform('idxmax')]
Out[856]:
ID VAL SEQ
0 A shoes 0
1 A flowers 1
2 A chairs 2
3 A apples 3
5 B shoes 0
6 B apples 1
9 C socks 0
10 C necklacs 1
11 C tables 2
12 C dishes 3
13 C apples 4