Question

我正在尝试获取时间序列的第一个元素，但是当我尝试使用apply时它似乎存在一些问题。例如：

a = pd.Series(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-09'])
a = pd.to_datetime(a).reset_index().set_index(0)
a

            index
0   
2014-01-01  0
2014-01-02  1
2014-01-03  2
2014-01-04  3
2014-01-09  4

当我a.resample('2D').apply(lambda x: x[0])时，我得到IndexError: index out of bounds。我怀疑这是因为我试图调用一个空系列的0元素，但似乎情况并非如此，而且使用.apply来重新采样是一个问题。我这样说是因为这个结果：

a.resample('2D').apply(lambda x: min(x))

            index
0   
2014-01-01  index
2014-01-03  index
2014-01-05  index
2014-01-07  index
2014-01-09  index

为了记录，a.resample('2D').apply(lambda x: x.min())效果很好。知道如何获得每两天的第一项并在没有NaN时返回NaN吗？

Answer 1

您似乎需要Resampler.first：

print (a.resample('2D').first())
            index
0                
2014-01-01    0.0
2014-01-03    2.0
2014-01-05    NaN
2014-01-07    NaN
2014-01-09    4.0

Answer 2

这就是你想要的吗？

a.resample('2D').first()
Out[251]: 
            index
0                
2014-01-01    0.0
2014-01-03    2.0
2014-01-05    NaN
2014-01-07    NaN
2014-01-09    4.0

x [0]不起作用的原因是因为日期范围中存在间隙，该区域中将有0行。您可以通过执行以下操作进行检查：

a.resample('2D').apply(lambda x: len(x))
Out[257]: 
            index
0                
2014-01-01      2
2014-01-03      2
2014-01-05      0
2014-01-07      0
2014-01-09      1

解决此问题的方法是添加支票：

a.resample('2D').apply(lambda x: x[0] if len(x)>0 else np.nan)
Out[258]: 
            index
0                
2014-01-01    0.0
2014-01-03    2.0
2014-01-05    NaN
2014-01-07    NaN
2014-01-09    4.0

尝试将函数应用于pandas timeseries resample时出错

2 个答案: