我有一个像这样的pandas数据框:
date price volume
0 2017-10-24 01:00:07.870000 51.90 1
1 2017-10-24 01:00:10.167000 51.90 1
2 2017-10-24 01:00:11.370000 51.89 -1
3 2017-10-24 01:00:11.370000 51.89 -6
4 2017-10-24 01:00:12.573000 51.90 5
5 2017-10-24 01:00:13.573000 51.89 -2
6 2017-10-24 01:00:13.776000 51.90 1
7 2017-10-24 01:00:21.276000 51.89 -1
8 2017-10-24 01:00:21.276000 51.88 -1
9 2017-10-24 01:00:21.276000 51.88 -2
10 2017-10-24 01:00:29.979000 51.89 1
如果我想将其转换为numpy数组并访问日期的分钟属性,则可能。
>>> array_df = df.values
>>> array_df[:,0] = np.array(array_df[:,0], dtype='datetime64[ms]')
>>> array_df
array([[datetime.datetime(2017, 10, 24, 1, 0, 7, 870000), 51.9, 1],
[datetime.datetime(2017, 10, 24, 1, 0, 10, 167000), 51.9, 1],
[datetime.datetime(2017, 10, 24, 1, 0, 11, 370000), 51.89, -1],
..., dtype=object)
>>> array_df[0][0].minute
0
但是当我创建一个具有相同datetime64[ms]
类型的结构化数组时,我无法访问其分钟属性。
>>> array_structured = np.zeros(10, dtype=[('index', np.int32),
('date', 'datetime64[ms]'),
('price', np.float32),
('neg_value', np.int32),
('pos_value', np.int32)])
>>> array_structured
array([(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0),
(0, '1970-01-01T00:00:00.000', 0., 0, 0)],
dtype=[('index', '<i4'), ('date', '<M8[ms]'), ('price', '<f4'), ('neg_value', '<i4'), ('pos_value', '<i4')])
>>> array_structured['date'][0] = np.datetime64('2017-10-24 01:00:07.870000')
>>> array_structured['date'][0].minute
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'numpy.datetime64' object has no attribute 'minute'
即使他们的dtypes与datetime64[ms]
相同,为什么date of array_structured
在minute attribute
拥有date of array_df
时却没有{{1}}?
答案 0 :(得分:2)
In [57]: data
Out[57]:
array([('1971-01-01T00:00:00.000', 0.), ('1972-01-01T00:00:00.000', 0.),
('2017-10-31T00:00:00.000', 0.)],
dtype=[('date', '<M8[ms]'), ('price', '<f4')])
In [58]: adate = data['date'][0]
In [59]: adate
Out[59]: numpy.datetime64('1971-01-01T00:00:00.000')
数组的元素没有minute
之类的属性。但是当使用Python
或item
提取到tolist
时,它们会成为“datetime
个对象:
In [68]: data['date'].tolist()
Out[68]:
[datetime.datetime(1971, 1, 1, 0, 0),
datetime.datetime(1972, 1, 1, 0, 0),
datetime.datetime(2017, 10, 31, 0, 0)]
In [61]: adate.item()
Out[61]: datetime.datetime(1971, 1, 1, 0, 0)
In [62]: adate.item().minute
Out[62]: 0
In [63]: adate.item().year
Out[63]: 1971
In [65]: [d.year for d in data['date'].tolist()]
Out[65]: [1971, 1972, 2017]
他们也可以使用'astype'转换为其他'时间单位':
In [66]: data['date'].astype('datetime64[Y]')
Out[66]: array(['1971', '1972', '2017'], dtype='datetime64[Y]')
In [67]: data['date'].astype('datetime64[m]')
Out[67]: array(['1971-01-01T00:00', '1972-01-01T00:00', '2017-10-31T00:00'], dtype='datetime64[m]')