Question

我有大量数据，我的 python pandas 数据框如下所示：

<头>

人力资源	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4

我只想选择那些基于 PatientID 且 SepsisLabel = 1 的行。像 PatientID 0、2 和 4 一样没有脓毒症标签 1。所以，我不希望它们出现在新的数据框中。我想要 PatientID 1 和 3，其中有 SepsisLabel = 1。

我希望你能理解我想说的话。如果是这样，请帮助我提供 python 代码。我确定它需要一些条件以及 iloc() 函数（我可能错了）。

问候。

Answer 1

使用 GroupBy.transform 和 GroupBy.any 测试是否每组至少有一个 True 并按 boolean indexing 过滤：

df1 = df[df['SepsisLabel'].eq(1).groupby(df['PatientID']).transform('any')]

或者用 1 过滤所有组并在 Series.isin 中过滤它们：

df1 = df[df['PatientID'].isin(df.loc[df['SepsisLabel'].eq(1), 'PatientID'])]

如果小数据或性能不重要可以使用DataFrameGroupBy.filter：

df1 = df.groupby('PatientID').filter(lambda x: x['SepsisLabel'].eq(1).any())

print (df1)
     HR  SBP  DBP  SepsisLabel  PatientID
3    95  130   90            0          1
4   102  120   80            1          1
5   109  115   75            1          1
9    88  115   75            0          3
10   93  125   85            1          3
11   78  130   90            1          3

如何根据条件选择熊猫数据框中的行

1 个答案:

人力资源	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4

人力资源	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4

人力资源	SBP	DBP	SepsisLabel	PatientID
92	120	80	0	0
98	115	85	0	0
93	125	75	0	0
95	130	90	0	1
102	120	80	1	1
109	115	75	1	1
94	135	100	0	2
97	100	70	0	2
85	120	80	0	2
88	115	75	0	3
93	125	85	1	3
78	130	90	1	3
115	140	110	0	4
102	120	80	0	4
98	140	110	0	4