熊猫根据其他行中的字符串保留某些行

时间:2020-07-22 11:09:13

标签: python pandas

我有以下数据框

+-------+------------+--+
| index |    keep    |  |
+-------+------------+--+
|     0 | not useful |  |
|     1 | start_1    |  |
|     2 | useful     |  |
|     3 | end_1      |  |
|     4 | not useful |  |
|     5 | start_2    |  |
|     6 | useful     |  |
|     7 | useful     |  |
|     8 | end_2      |  |
+-------+------------+--+

有两对字符串(start_1end_1start_2end_2)表明,这些字符串之间的行是数据中唯一相关的行。因此,在下面的数据帧中,输出数据帧将仅由索引2、6、7的行组成(因为2在start_1和end_1之间;而6和7在start_2和end_2之间)

d = {'keep': ["not useful", "start_1", "useful", "end_1", "not useful", "start_2", "useful", "useful", "end_2"]}
df = pd.DataFrame(data=d)

解决此问题的最Pythonic / Pandas方法是什么? 谢谢

1 个答案:

答案 0 :(得分:2)

这是一种方法(为清楚起见,只需几个步骤)。可能还有其他人:

df["sections"] = 0
df.loc[df.keep.str.startswith("start"), "sections"] = 1
df.loc[df.keep.str.startswith("end"), "sections"] = -1
df["in_section"] = df.sections.cumsum()
res = df[(df.in_section == 1) & ~df.keep.str.startswith("start")]

输出:

   index    keep  sections  in_section
2      2  useful         0           1
6      6  useful         0           1
7      7  useful         0           1
相关问题