我有一个以日期时间为索引的熊猫数据框,例如:
longitude latitude group_1
timestamp
2019-01-04 08:25:10 47.4900 -18.7983 1.0
2019-01-04 08:25:20 47.4983 -18.8000 1.0
2019-01-04 08:25:28 47.5050 -18.8000 1.0
2019-01-04 08:25:36 47.5133 -18.8000 1.0
2019-01-04 08:25:44 47.5200 -18.7967 1.0
2019-01-04 08:25:52 47.5250 -18.7933 1.0
2019-01-04 08:26:05 47.5367 -18.7867 1.0
2019-01-04 08:26:21 47.5500 -18.7767 1.0
2019-01-04 08:26:34 47.5600 -18.7683 1.0
2019-01-04 08:26:42 47.5683 -18.7633 1.0
2019-01-04 08:27:05 47.5900 -18.7483 1.0
2019-01-04 08:27:53 47.6350 -18.7150 1.0
2019-01-04 08:28:40 47.6817 -18.6783 1.0
2019-01-04 08:33:44 48.0700 -18.3933 NaN
2019-01-04 08:54:05 49.6333 -17.2233 NaN
2019-01-04 08:55:43 49.7233 -17.1667 NaN
2019-01-04 08:57:43 49.8117 -17.1450 NaN
2019-01-04 09:59:44 49.8150 -17.0900 NaN
2019-01-04 10:00:02 49.8133 -17.0767 1.0
2019-01-04 10:00:09 49.8117 -17.0717 1.0
2019-01-04 10:00:31 49.8050 -17.0567 1.0
2019-01-04 10:02:49 49.7483 -16.9183 1.0
2019-01-04 10:39:12 48.5383 -13.6500 1.0
2019-01-04 10:45:31 48.3683 -13.3033 NaN
2019-01-04 10:46:47 48.3317 -13.2933 NaN
2019-01-04 10:47:11 48.3217 -13.3033 NaN
2019-01-04 11:40:01 48.3567 -13.3483 1.0
2019-01-04 11:40:41 48.3500 -13.3917 1.0
2019-01-04 11:41:23 48.3433 -13.4383 1.0
2019-01-04 11:42:07 48.3350 -13.4867 1.0
如何在group_1
中选择与每组1.0中的第一个值相对应的行?使用上面的示例数据,所需的输出为:
longitude latitude group_1
timestamp
2019-01-04 08:25:10 47.4900 -18.7983 1.0
2019-01-04 10:00:02 49.8133 -17.0767 1.0
2019-01-04 11:40:01 48.3567 -13.3483 1.0
答案 0 :(得分:1)
创建一个掩码以分隔NaN
值的孤岛,然后使用groupby
+ idxmax
u = df['group_1']
m = u.isnull() & u.shift().notnull()
ii = u.groupby(m.cumsum()).idxmax()
df.loc[ii]
longitude latitude group_1
timestamp
2019-01-04 08:25:10 47.4900 -18.7983 1.0
2019-01-04 10:00:02 49.8133 -17.0767 1.0
2019-01-04 11:40:01 48.3567 -13.3483 1.0
答案 1 :(得分:1)
您可以尝试(适用于任何形式的1
块):
s = df['group_1'].ne(1)
blocks = s.cumsum()
df[~s].groupby(blocks[~s], group_keys=False).head(1)
或不使用groupby(当1
块与nan
交织时起作用)
df[df.group_1.shift().fillna(0).lt(df.group_1)]
输出:
longitude latitude group_1
timestamp
2019-01-04 08:25:10 47.4900 -18.7983 1.0
2019-01-04 10:00:02 49.8133 -17.0767 1.0
2019-01-04 11:40:01 48.3567 -13.3483 1.0