Question

我有一个numpy数组，其中充满cftime类的对象（dtype = object）。

In [1]: a
Out[1]: 
array([cftime.DatetimeNoLeap(2000, 1, 1, 11, 29, 59, 999996, 5, 1),
       cftime.DatetimeNoLeap(2000, 1, 2, 11, 29, 59, 999996, 6, 2),
       cftime.DatetimeNoLeap(2000, 1, 3, 11, 29, 59, 999996, 0, 3)],
      dtype=object)

In [2]: type(a[0])
Out[2]: cftime._cftime.DatetimeNoLeap

每个对象都有一个属性month。

a[0].month
Out[66]: 1

我想获得一个具有相同形状的新numpy数组，但为原始数组的每个元素填充此属性。类似于b=a.month。但这显然失败了，因为a是一个没有month属性的numpy数组。如何获得此结果？

PS：当然，我可以使用简单的Python循环来做到这一点，但我想采用一种完全麻木的方法：

b=np.zeros_like(a, dtype=int)
for i in range(a.size):
    b[i] = a[i].month

Answer 1

可以使用np.vectorize来将函数映射到数组中的每个元素。在这种情况下，您可以定义一个自定义lambda函数以提取每个条目lambda x: x.month的月份：

np.vectorize(lambda x: x.month)(a)
array([1, 1, 1])

Answer 2

我没有安装cftime，因此将使用常规的datetime对象进行演示。

首先创建一个datetime对象数组-使用numpy自己的datetime dtype的一种惰性方法：

In [599]: arr = np.arange('2000-01-11','2000-12-31',dtype='datetime64[D]')
In [600]: arr.shape
Out[600]: (355,)

从中创建对象dtype数组：

In [601]: arrO = arr.astype(object)

以及日期时间列表：

In [602]: alist = arr.tolist()

定时进行常规列表理解：

In [603]: timeit [d.month for d in alist]
20.1 µs ± 62.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

对对象dtype数组的列表理解通常要慢一些（但比对常规数组的列表理解要快）：

In [604]: timeit [d.month for d in arrO]
30.7 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

frompyfunc-速度较慢；其他时候，我看到它比列表理解快2倍：

In [605]: timeit np.frompyfunc(lambda x: x.month, 1,1)(arrO)
51 µs ± 32.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

vectorize（几乎）总是比frompyfunc慢（即使实际迭代使用frompyfunc）：

In [606]: timeit np.vectorize(lambda x: x.month, otypes=[int])(arrO)
76.7 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

以下是数组和列表的示例：

In [607]: arr[:5]
Out[607]: 
array(['2000-01-11', '2000-01-12', '2000-01-13', '2000-01-14',
       '2000-01-15'], dtype='datetime64[D]')
In [608]: arrO[:5]
Out[608]: 
array([datetime.date(2000, 1, 11), datetime.date(2000, 1, 12),
       datetime.date(2000, 1, 13), datetime.date(2000, 1, 14),
       datetime.date(2000, 1, 15)], dtype=object)
In [609]: alist[:5]
Out[609]: 
[datetime.date(2000, 1, 11),
 datetime.date(2000, 1, 12),
 datetime.date(2000, 1, 13),
 datetime.date(2000, 1, 14),
 datetime.date(2000, 1, 15)]

当您需要广播和多维数组的通用性时，最好使用

frompyfunc和vectorize。对于一维数组，列表理解几乎总是更好。

为了公平起见，frompyfunc，我应该从列表理解中返回一个数组：

In [610]: timeit np.array([d.month for d in arrO])
50.1 µs ± 36.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

要以numpy中的日期获得最佳速度，请使用datatime64 dtype而不是object dtype。这样可以更多地使用已编译的numpy代码。

In [611]: timeit arr = np.arange('2000-01-11','2000-12-31',dtype='datetime64[D]'
     ...: )
3.16 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [616]: arr.astype('datetime64[M]')[::60]
Out[616]: 
array(['2000-01', '2000-03', '2000-05', '2000-07', '2000-09', '2000-11'],
      dtype='datetime64[M]')

numpy数组中元素的访问属性

2 个答案: