我有一个DataFrame,我需要通过开始和结束日期获取更大的空行序列以供进一步研究。我的索引是一个DatatimeIndex对象,DataFrame看起来像这样:
C Instalation N Serial Number D Register Read \
Z Ts Read
2016-12-25 00:00:00 PT0002000080299561BD 10101516046456 A+
2016-12-25 00:15:00 PT0002000080299561BD 10101516046456 A+
2016-12-25 00:30:00 PT0002000080299561BD 10101516046456 A+
2016-12-25 00:45:00 PT0002000080299561BD 10101516046456 A+
2016-12-25 01:00:00 PT0002000080299561BD 10101516046456 A+
M Read D Read Unit
Z Ts Read
2016-12-25 00:00:00 0,002 kWh
2016-12-25 00:15:00 0,002 kWh
2016-12-25 00:30:00 0,002 kWh
2016-12-25 00:45:00 0,002 kWh
2016-12-25 01:00:00 0,002 kWh
NaN值可以分散在列数据框中,没问题。但如果他们是连续的,我会介意的。在这种情况下,我想知道每行至少有一个NaN值,开始和结束index
并计算两者之间的范围差异。最后,我希望获得更大的范围。
可以这样做吗?
答案 0 :(得分:0)
不确定我理解Q 100%但也许这就是你想要的:
df = pd.DataFrame({"a": [1, 2, 3, np.nan, np.nan, np.nan, 7, 8], "b": [1, 2, 3, np.nan, 5, 6, 7, 8]}
print df
a b
0 1.0 1.0
1 2.0 2.0
2 3.0 3.0
3 NaN NaN
4 NaN 5.0
5 NaN 6.0
6 7.0 7.0
7 8.0 8.0
counts = df.isnull()
counts[~counts] = np.nan
print counts
a b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 1.0 1.0
4 1.0 NaN
5 1.0 NaN
6 NaN NaN
7 NaN NaN
runs = counts.cumsum()
print runs
a b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 1.0 1.0
4 2.0 NaN
5 3.0 NaN
6 NaN NaN
7 NaN NaN
runs.max(axis=0)
a 3.0
b 1.0
dtype: float64