我有带有日期时间和一列的数据框。我必须找到两个空值之间的最大值范围。在下面的示例中,两个空值之间的最大值范围是4,即从时间戳'02 -01-2018 00:05到02-01-2018 00:20'
以下是我的示例数据:
Datetime X
01-01-2018 00:00 1
01-01-2018 00:05 Nan
01-01-2018 00:10 2
01-01-2018 00:15 3
01-01-2018 00:20 2
01-01-2018 00:25 Nan
01-01-2018 00:30 Nan
01-01-2018 00:35 Nan
01-01-2018 00:40 4
02-01-2018 00:00 Nan
02-01-2018 00:05 2
02-01-2018 00:10 2
02-01-2018 00:15 2
02-01-2018 00:20 2
02-01-2018 00:25 Nan
02-01-2018 00:30 Nan
02-01-2018 00:35 3
02-01-2018 00:40 Nan
答案 0 :(得分:1)
假设您只想要两个空值之间的最大拉伸数,则可以使用Series.isnull()
查找空值的索引,并使用list comprehension
查找差异:
indexes = df[df.X.isnull()].index
max([(indexes[i+1] - indexes[i]-1) for i in range(len(indexes)-1)])
>> 4
如果您还想要时间戳记:
indexes = df[df.X.isnull()].index
max_nulls = max([((indexes[i+1] - indexes[i]-1), indexes[i], indexes[i+1]) for i in range(len(indexes)-1)], key = lambda x: x[0])
max_nulls
>>(4, 9, 15)
df.loc[max_nulls[1]:max_nulls[2]]
Datetime X
9 02-01-2018 00:00 NaN
10 02-01-2018 00:05 2.0
11 02-01-2018 00:10 2.0
12 02-01-2018 00:15 2.0
13 02-01-2018 00:20 2.0
14 02-01-2018 00:25 NaN
如果只希望时间戳之间具有最大的非null值拉伸,请使用:
df.loc[[max_nulls[1], max_nulls[2]]]
Datetime X
9 02-01-2018 00:00 NaN
14 02-01-2018 00:25 NaN
或
df.loc[[max_nulls[1]+1, max_nulls[2]-1]]
Datetime X
10 02-01-2018 00:05 2.0
13 02-01-2018 00:20 2.0