识别数据框列中两个空值之间的最大值拉伸

时间:2019-05-07 05:09:56

标签: python pandas dataframe

我有带有日期时间和一列的数据框。我必须找到两个空值之间的最大值范围。在下面的示例中,两个空值之间的最大值范围是4,即从时间戳'02 -01-2018 00:05到02-01-2018 00:20'

以下是我的示例数据:

Datetime            X
01-01-2018 00:00    1
01-01-2018 00:05    Nan
01-01-2018 00:10    2
01-01-2018 00:15    3
01-01-2018 00:20    2
01-01-2018 00:25    Nan
01-01-2018 00:30    Nan
01-01-2018 00:35    Nan
01-01-2018 00:40    4
02-01-2018 00:00    Nan
02-01-2018 00:05    2
02-01-2018 00:10    2
02-01-2018 00:15    2
02-01-2018 00:20    2
02-01-2018 00:25    Nan
02-01-2018 00:30    Nan
02-01-2018 00:35    3
02-01-2018 00:40    Nan

1 个答案:

答案 0 :(得分:1)

假设您只想要两个空值之间的最大拉伸数,则可以使用Series.isnull()查找空值的索引,并使用list comprehension查找差异:

indexes = df[df.X.isnull()].index         
max([(indexes[i+1] - indexes[i]-1) for i in range(len(indexes)-1)])
>> 4

如果您还想要时间戳记:

indexes = df[df.X.isnull()].index          
max_nulls = max([((indexes[i+1] - indexes[i]-1), indexes[i], indexes[i+1]) for i in range(len(indexes)-1)], key = lambda x: x[0])
max_nulls
>>(4, 9, 15)

df.loc[max_nulls[1]:max_nulls[2]]
     Datetime             X
9   02-01-2018 00:00    NaN
10  02-01-2018 00:05    2.0
11  02-01-2018 00:10    2.0
12  02-01-2018 00:15    2.0
13  02-01-2018 00:20    2.0
14  02-01-2018 00:25    NaN

如果只希望时间戳之间具有最大的非null值拉伸,请使用:

df.loc[[max_nulls[1], max_nulls[2]]]
    Datetime             X
9   02-01-2018 00:00    NaN
14  02-01-2018 00:25    NaN

df.loc[[max_nulls[1]+1, max_nulls[2]-1]]

      Datetime           X
10  02-01-2018 00:05    2.0
13  02-01-2018 00:20    2.0