Question

我正在查看一些天气数据，它包含每小时降水的数字。我试图整理包含降水量的条目＆gt; 0（范围从0到大约0.4）。

我认为我可以通过weather[weather.HourlyPrecip > 0]来实现这些目标（weather这里是一个DataFrame），但显然我错了：

fig, axes = plt.subplots(nrows=2, ncols=2)
weather[weather.HourlyPrecip > 0].HourlyPrecip.resample('1D', how='sum').plot(ax=axes[0,0]); axes[0,0].set_title('>0')
weather[weather.HourlyPrecip >= 0].HourlyPrecip.resample('1D', how='sum').plot(ax=axes[1,0]); axes[1,0].set_title('>=0')
weather[weather.HourlyPrecip == 0].HourlyPrecip.resample('1D', how='sum').plot(ax=axes[0,1]); axes[0,1].set_title('==0')
weather.HourlyPrecip.resample('1D', how='sum').plot(ax=axes[1,1]); axes[1,1].set_title('all')

... ...产量

chart

这里发生了什么？

更新：这里大致了解数据的样子：

In [215]: weather.HourlyPrecip
Out[215]: Date_Time
          2013-12-01 00:51:00    0
          2013-12-01 01:20:00    0
          2013-12-01 01:51:00    0
          2013-12-01 02:51:00    0
          2013-12-01 03:10:00    0
          2013-12-01 03:49:00    0
          2013-12-01 03:51:00    0
          2013-12-01 04:25:00    0
          2013-12-01 04:35:00    0
          2013-12-01 04:51:00    0
          2013-12-01 05:51:00    0
          2013-12-01 06:00:00    0
          2013-12-01 06:09:00    0
          2013-12-01 06:40:00    0
          2013-12-01 06:51:00    0
          ...
          2013-12-31 09:51:00    0
          2013-12-31 10:51:00    0
          2013-12-31 11:51:00    0
          2013-12-31 12:51:00    0
          2013-12-31 13:51:00    0
          2013-12-31 14:51:00    0
          2013-12-31 15:51:00    0
          2013-12-31 16:51:00    0
          2013-12-31 17:51:00    0
          2013-12-31 18:51:00    0
          2013-12-31 19:51:00    0
          2013-12-31 20:51:00    0
          2013-12-31 21:51:00    0
          2013-12-31 22:51:00    0
          2013-12-31 23:51:00    0
          Name: HourlyPrecip, Length: 1018

所有价值观：

In [216]: np.unique(weather.HourlyPrecip.ravel())
Out[216]: array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,
          0.09,  0.1 ,  0.12,  0.13,  0.19,  0.2 ,  0.23,  0.24,  0.28,  0.38])

（该列全部是浮点数。）

Answer 1

您的初步假设是正确的：df[df.precip > 0]确实有效。你的测试存在缺陷。当您重新采样时，如果当天没有每小时数据，您最终会得到一堆np.nan值。因此，当你绘制它时，它看起来脱节。

尝试使用类似resampled_data.fillna(0).plot()的内容，我认为您会看到您期待的内容。

熊猫过滤给出了一些奇怪的结果

1 个答案: