如何从第一行到包含特定值的行对每个组中的行进行切片?

时间:2019-06-25 13:29:30

标签: python pandas

我有一个这样的数据框:

df_1 = pd.DataFrame({
    'ID' : ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
    'VAL' : ['shoes', 'flowers', 'chairs', 'apples', 'dice', 'shoes', 'apples',
             'curtain', 'sand', 'socks', 'necklacs', 'tables', 'dishes', 'apples'],
    'SEQ' : [0, 1, 2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 4]
})

   ID       VAL  SEQ
0   A     shoes    0
1   A   flowers    1
2   A    chairs    2
3   A    apples    3
4   A      dice    4
5   B     shoes    0
6   B    apples    1
7   B   curtain    2
8   B      sand    3
9   C     socks    0
10  C  necklacs    1
11  C    tables    2
12  C    dishes    3
13  C    apples    4

我想对行进行切片,直到得到一个值,例如,对每个ID组中的所有行进行切片,直到apple

Out[110]: 
   ID       VAL  SEQ
0   A     shoes    0
1   A   flowers    1
2   A    chairs    2
3   A    apples    3
4   B     shoes    0
5   B    apples    1
6   C     socks    0
7   C  necklacs    1
8   C    tables    2
9   C    dishes    3
10  C    apples    4

3 个答案:

答案 0 :(得分:5)

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home export PATH=$PATH:/Users/ahajibagheri/Library/Python/2.7/bin if which pyspark > /dev/null; then export SPARK_HOME="/usr/local/Cellar/apache-spark/2.4.3/libexec/" export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH fi idxmaxgroupby

concat

答案 1 :(得分:4)

GroupBy.cumsum是你的朋友:

mask = (df_1['VAL'].eq('apples')
                   .shift()
                   .astype(float)
                   .groupby(df_1['ID'])
                   .cumsum()
                   .lt(1))
df_1[mask]

   ID       VAL  SEQ
1   A   flowers    1
2   A    chairs    2
3   A    apples    3
5   B     shoes    0
6   B    apples    1
9   C     socks    0
10  C  necklacs    1
11  C    tables    2
12  C    dishes    3
13  C    apples    4

如果ID可能以您要查找的字词结尾,则上面的shift解决方案(虽然方便)将是不合适的。将GroupBy.applycumsum一起使用:

mask = (df_1['VAL'].eq('apples')
                   .groupby(df_1['ID'])
                   .apply(lambda x: x.shift().fillna(0).cumsum())
                   .lt(1))
df_1[mask]

   ID       VAL  SEQ
1   A   flowers    1
2   A    chairs    2
3   A    apples    3
5   B     shoes    0
6   B    apples    1
9   C     socks    0
10  C  necklacs    1
11  C    tables    2
12  C    dishes    3
13  C    apples    4

答案 2 :(得分:2)

我正在使用transform

df_1[df_1.index<=df_1.VAL.eq('apples').groupby(df_1['ID']).transform('idxmax')]
Out[856]: 
   ID       VAL  SEQ
0   A     shoes    0
1   A   flowers    1
2   A    chairs    2
3   A    apples    3
5   B     shoes    0
6   B    apples    1
9   C     socks    0
10  C  necklacs    1
11  C    tables    2
12  C    dishes    3
13  C    apples    4