根据pandas中字符串的最后一次出现来选择行

时间:2017-10-16 13:07:41

标签: python pandas

我有一个像这样的pandas数据框,

id   desc
1    Description
1    02.09.2017 15:00 abcd
1    this is a sample description
1    which is continued here also
1    
1    Description
1    01.09.2017 12:00 absd
1    this is another sample description
1    which might be continued here
1    or here
1
2    Description
2    09.03.2017 12:00 abcd
2    another sample again
2    and again
2
2    Description
2    08.03.2017 12:00 abcd
2    another sample again
2    and again times two 

基本上,有一个id,行包含非结构化格式的信息。我想提取最后一个"描述"之后的描述。行和存储在一行。结果数据框看起来像这样:

id  desc
1   this is another sample description which might be continued here or here
2   another sample again and again times two

从我能够想到的情况来看,我可能不得不使用groupby,但在此之后我不知道该怎么做。

1 个答案:

答案 0 :(得分:1)

提取上一个Description的位置,并使用str.cat

加入行
In [2840]: def lastjoin(x):
      ...:     pos = x.desc.eq('Description').cumsum().idxmax()
      ...:     return x.desc.loc[pos+2:].str.cat(sep=' ')
      ...:

In [2841]: df.groupby('id').apply(lastjoin)
Out[2841]:
id
1    this is another sample description which might...
2            another sample again and again times two
dtype: object

让列使用reset_index

In [3216]: df.groupby('id').apply(lastjoin).reset_index(name='desc')
Out[3216]:
   id                                               desc
0   1  this is another sample description which might...
1   2          another sample again and again times two