如何使用histogramdd在dtype为object的numpy数组上执行直方图?

时间:2014-12-08 06:59:33

标签: python numpy histogram python-datetime multidimensional-array

我想在(N, 3) numpy array上执行直方图,其三维代表相应的经度,纬度和时间戳,如下所示:

array([[116.45565032958984, 39.889976501464844,
        datetime.datetime(2012, 10, 1, 6, 32, 39)],
       [116.45565032958984, 39.889984130859375,
        datetime.datetime(2012, 10, 1, 6, 33, 31)],
       [116.45565032958984, 39.889984130859375,
        datetime.datetime(2012, 10, 1, 6, 33, 33)],
       [116.45565032958984, 39.889984130859375,
        datetime.datetime(2012, 10, 1, 6, 33, 37)],
       [116.45561981201172, 39.89040756225586,
        datetime.datetime(2012, 10, 1, 6, 34, 42)],
       [116.45561981201172, 39.890411376953125,
        datetime.datetime(2012, 10, 1, 6, 36, 40)],
       [116.45549774169922, 39.8941650390625,
        datetime.datetime(2012, 10, 1, 6, 37, 54)],
       [116.45556640625, 39.92431640625,
        datetime.datetime(2012, 10, 1, 6, 38, 57)],
       [116.45578002929688, 39.93780517578125,
        datetime.datetime(2012, 10, 1, 6, 42, 10)],
       [116.44468688964844, 39.93989944458008,
        datetime.datetime(2012, 10, 1, 6, 43, 21)]], dtype=object)

我试图像这样使用np.histogramdd

import numpy as np
np.histogramdd(my_data, bins = (lon_bin_num, lat_bin_num, time_bin_num), 
                range = [[lon_min, lon_max], [lat_min, lat_max], 
                [start_datetime, end_datetime]])

得到TypeError

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-271-58c94eecf21d> in <module>()
      1 np.histogramdd(tmp2, bins = (lon_bin_num, lat_bin_num, time_bin_num),
----> 2                range = [[lon_min, lon_max], [lat_min, lat_max], [start_datetime, end_datetime]])

/*/*/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights)
    318         smax = zeros(D)
    319         for i in arange(D):
--> 320             smin[i], smax[i] = range[i]
    321 
    322     # Make sure the bins have a finite width.

TypeError: float() argument must be a string or a number

我知道这是导致错误的日期时间对象,但我想知道如何更正此错误或如何在numpy ndarray dtype = object上执行直方图?

1 个答案:

答案 0 :(得分:1)

许多NumPy函数不适用于dtype object的数组。要使用np.histogramdd,您需要一个形状(N, D)的数组,因此结构化数组在这里也没有用处(因为结构化数组会删除D维度)。您需要一组同源非对象dtype。由于前两列是浮点数,让我们尝试将第三列表示为浮点数:

您可以将日期转换为NumPy的原生datetime64[s] dtype:

In [102]: dates = np.array(my_data[:, 2],dtype='<M8[s]')

In [103]: dates
Out[103]: 
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
       '2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
       '2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
       '2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
       '2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')

然后使用astypedatetime64[s]转换为float s:

In [104]: float_dates = dates.astype('float')

In [105]: float_dates
Out[105]: 
array([  1.34907316e+09,   1.34907321e+09,   1.34907321e+09,
         1.34907322e+09,   1.34907328e+09,   1.34907340e+09,
         1.34907347e+09,   1.34907354e+09,   1.34907373e+09,
         1.34907380e+09])

现在使用dtype float形成一个新数组:

arr = np.empty_like(my_data, dtype='float')
arr[:, 0:2] = my_data[:, 0:2]
arr[:, 2] = float_dates

hist, edges = np.histogramdd(arr, bins=(xedges, yedges, zedges))

虽然这会给你一个直方图,但你可能还需要将浮点数重新解释为日期。您可以使用astype执行此操作。获取datetime64[s]

In [99]: float_dates.astype('<M8[s]')
Out[99]: 
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
       '2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
       '2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
       '2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
       '2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')

获取Python datetime.datetime对象:

In [116]: float_dates.astype('<M8[s]').tolist()
Out[116]: 
[datetime.datetime(2012, 10, 1, 6, 32, 39),
 datetime.datetime(2012, 10, 1, 6, 33, 31),
 datetime.datetime(2012, 10, 1, 6, 33, 33),
 datetime.datetime(2012, 10, 1, 6, 33, 37),
 datetime.datetime(2012, 10, 1, 6, 34, 42),
 datetime.datetime(2012, 10, 1, 6, 36, 40),
 datetime.datetime(2012, 10, 1, 6, 37, 54),
 datetime.datetime(2012, 10, 1, 6, 38, 57),
 datetime.datetime(2012, 10, 1, 6, 42, 10),
 datetime.datetime(2012, 10, 1, 6, 43, 21)]