我想在(N, 3) numpy array
上执行直方图,其三维代表相应的经度,纬度和时间戳,如下所示:
array([[116.45565032958984, 39.889976501464844,
datetime.datetime(2012, 10, 1, 6, 32, 39)],
[116.45565032958984, 39.889984130859375,
datetime.datetime(2012, 10, 1, 6, 33, 31)],
[116.45565032958984, 39.889984130859375,
datetime.datetime(2012, 10, 1, 6, 33, 33)],
[116.45565032958984, 39.889984130859375,
datetime.datetime(2012, 10, 1, 6, 33, 37)],
[116.45561981201172, 39.89040756225586,
datetime.datetime(2012, 10, 1, 6, 34, 42)],
[116.45561981201172, 39.890411376953125,
datetime.datetime(2012, 10, 1, 6, 36, 40)],
[116.45549774169922, 39.8941650390625,
datetime.datetime(2012, 10, 1, 6, 37, 54)],
[116.45556640625, 39.92431640625,
datetime.datetime(2012, 10, 1, 6, 38, 57)],
[116.45578002929688, 39.93780517578125,
datetime.datetime(2012, 10, 1, 6, 42, 10)],
[116.44468688964844, 39.93989944458008,
datetime.datetime(2012, 10, 1, 6, 43, 21)]], dtype=object)
我试图像这样使用np.histogramdd
:
import numpy as np
np.histogramdd(my_data, bins = (lon_bin_num, lat_bin_num, time_bin_num),
range = [[lon_min, lon_max], [lat_min, lat_max],
[start_datetime, end_datetime]])
得到TypeError
:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-271-58c94eecf21d> in <module>()
1 np.histogramdd(tmp2, bins = (lon_bin_num, lat_bin_num, time_bin_num),
----> 2 range = [[lon_min, lon_max], [lat_min, lat_max], [start_datetime, end_datetime]])
/*/*/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights)
318 smax = zeros(D)
319 for i in arange(D):
--> 320 smin[i], smax[i] = range[i]
321
322 # Make sure the bins have a finite width.
TypeError: float() argument must be a string or a number
我知道这是导致错误的日期时间对象,但我想知道如何更正此错误或如何在numpy ndarray dtype = object
上执行直方图?
答案 0 :(得分:1)
许多NumPy函数不适用于dtype object
的数组。要使用np.histogramdd
,您需要一个形状(N, D)
的数组,因此结构化数组在这里也没有用处(因为结构化数组会删除D
维度)。您需要一组同源非对象dtype。由于前两列是浮点数,让我们尝试将第三列表示为浮点数:
您可以将日期转换为NumPy的原生datetime64[s]
dtype:
In [102]: dates = np.array(my_data[:, 2],dtype='<M8[s]')
In [103]: dates
Out[103]:
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
'2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
'2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
'2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
'2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')
然后使用astype
将datetime64[s]
转换为float
s:
In [104]: float_dates = dates.astype('float')
In [105]: float_dates
Out[105]:
array([ 1.34907316e+09, 1.34907321e+09, 1.34907321e+09,
1.34907322e+09, 1.34907328e+09, 1.34907340e+09,
1.34907347e+09, 1.34907354e+09, 1.34907373e+09,
1.34907380e+09])
现在使用dtype float
形成一个新数组:
arr = np.empty_like(my_data, dtype='float')
arr[:, 0:2] = my_data[:, 0:2]
arr[:, 2] = float_dates
hist, edges = np.histogramdd(arr, bins=(xedges, yedges, zedges))
虽然这会给你一个直方图,但你可能还需要将浮点数重新解释为日期。您可以使用astype
执行此操作。获取datetime64[s]
:
In [99]: float_dates.astype('<M8[s]')
Out[99]:
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
'2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
'2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
'2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
'2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')
获取Python datetime.datetime
对象:
In [116]: float_dates.astype('<M8[s]').tolist()
Out[116]:
[datetime.datetime(2012, 10, 1, 6, 32, 39),
datetime.datetime(2012, 10, 1, 6, 33, 31),
datetime.datetime(2012, 10, 1, 6, 33, 33),
datetime.datetime(2012, 10, 1, 6, 33, 37),
datetime.datetime(2012, 10, 1, 6, 34, 42),
datetime.datetime(2012, 10, 1, 6, 36, 40),
datetime.datetime(2012, 10, 1, 6, 37, 54),
datetime.datetime(2012, 10, 1, 6, 38, 57),
datetime.datetime(2012, 10, 1, 6, 42, 10),
datetime.datetime(2012, 10, 1, 6, 43, 21)]