对于如下数据:
Name Stage Start End
Hulk 1 21/10/2018 06:34:15 21/10/2018 07:34:15
Hulk 2 21/10/2018 07:34:15 21/10/2018 07:54:15
Hulk 3 21/10/2018 07:58:15 21/10/2018 08:14:15
Hulk 4 21/10/2018 08:14:15 21/10/2018 08:34:15
Sam A1 21/10/2018 09:34:15 21/10/2018 10:34:15
Sam A2 21/10/2018 10:34:15 21/10/2018 10:45:15
Sam A3 21/10/2018 10:45:15 21/10/2018 11:00:15
Sam A4 21/10/2018 11:00:15 21/10/2018 11:34:15
Bruce 1.1 21/10/2018 11:34:15 21/10/2018 11:45:15
Bruce 1.2 21/10/2018 11:45:15 21/10/2018 12:00:15
Bruce 1.3 21/10/2018 12:00:15 21/10/2018 12:25:15
Bruce 1.4 21/10/2018 12:25:15 21/10/2018 12:45:15
Peter 1 21/10/2018 12:45:15 21/10/2018 01:05:15
Peter 1 21/10/2018 01:05:15 21/10/2018 01:15:15
如何为每个first
拥有last
的{{1}}和Stage
实例,例如其中以Name
开头并以{{1 }}?
数据框应采用以下方式:
1
我尝试将4
与Name Stage Start End
Hulk 1 21/10/2018 06:34:15 21/10/2018 07:34:15
Hulk 4 21/10/2018 08:14:15 21/10/2018 08:34:15
Sam A1 21/10/2018 09:34:15 21/10/2018 10:34:15
Sam A4 21/10/2018 11:00:15 21/10/2018 11:34:15
Bruce 1.1 21/10/2018 11:34:15 21/10/2018 11:45:15
Bruce 1.4 21/10/2018 12:25:15 21/10/2018 12:45:15
一起使用,但没有得到如上所述的所需数据帧。
答案 0 :(得分:3)
将duplicated
与str.contains
和boolean indexing
一起使用,首先返回必要的行,然后将value_counts
与map
一起用于仅过滤两个行组:
m1 = ~df['Name'].duplicated()
m2 = df['Stage'].str.contains('1')
m3 = ~df['Name'].duplicated(keep='last')
m4 = df['Stage'].str.contains('4')
df1 = df[(m1 & m2) | (m3 & m4)].copy()
df1 = df1[df1['Name'].map(df1['Name'].value_counts()) == 2]
print (df1)
Name Stage Start End
0 Hulk 1 21/10/2018 06:34:15 21/10/2018 07:34:15
3 Hulk 4 21/10/2018 08:14:15 21/10/2018 08:34:15
4 Sam A1 21/10/2018 09:34:15 21/10/2018 10:34:15
7 Sam A4 21/10/2018 11:00:15 21/10/2018 11:34:15
8 Bruce 1.1 21/10/2018 11:34:15 21/10/2018 11:45:15
11 Bruce 1.4 21/10/2018 12:25:15 21/10/2018 12:45:15