到目前为止我尝试了什么

设置

In [1]: import tables as tb
In [2]: import numpy as np
In [3]: from datetime import datetime

创建数据

In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))]
In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')])
In [6]: rec  # a numpy array with my data
Out[6]: 
array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)),
       (2, datetime.datetime(2001, 2, 2, 2, 2, 2))], 
      dtype=[('a', '<i4'), ('b', '<M8[us]')])

使用`Time64Col`描述符

打开PyTables数据集

In [7]: f = tb.open_file('foo.h5', 'w')  # New PyTables file
In [8]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0), 
                                                    'b': tb.Time64Col(pos=1)})
In [9]: d
Out[9]: 
/bar (Table(0,)) ''
  description := {
  "a": Int32Col(shape=(), dflt=0, pos=0),
  "b": Time64Col(shape=(), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (5461,)

将NumPy数据附加到PyTables数据集

In [10]: d.append(rec)
In [11]: d
Out[11]: 
/bar (Table(2,)) ''
  description := {
  "a": Int32Col(shape=(), dflt=0, pos=0),
  "b": Time64Col(shape=(), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (5461,)

我的约会对象发生了什么变化？

In [12]: d[:]
Out[12]: 
array([(1, 0.0), (2, 0.0)], 
      dtype=[('a', '<i4'), ('b', '<f8')])

据我所知，HDF5不提供日期时间的原生支持。我希望PyTables覆盖的额外元数据可以处理这个问题。

我的问题

如何在PyTables中存储包含日期时间的numpy记录数组？如何有效地将PyTables表中的数据提取回NumPy数组并保留我的日期时间？

常见答案

我通常会得到这样的答案：

使用Pandas

我不想使用Pandas，因为我没有索引，我不希望存储在我的数据集中，而且Pandas不允许你没有/存储索引（参见{{3 }}）

Answer 1

首先，将值放入Time64Col时，它们必须为float64 s。您可以拨打astype来执行此操作，如下所示：

new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])

然后你需要将列b转换为自纪元以来的秒数，这意味着你需要除以1,000,000，因为我们只需要几微秒：

new_rec['b'] = new_rec['b'] / 1e6

然后拨打d.append(new_rec)

当你将数组读回内存时，反过来并乘以1,000,000。在放入任何内容之前，您必须确保事情以微秒为单位，这由astype('datetime64[us]')在numpy＆gt; = 1.7.x中自动处理

我使用了此问题的解决方案：How to get unix timestamp from numpy.datetime64

以下是您的示例的工作版本：

In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))]

In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')])

In [6]: new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])

In [7]: new_rec
Out[7]:
array([(1, 946688461000000.0), (2, 981079322000000.0)],
      dtype=[('a', '<i4'), ('b', '<f8')])

In [8]: new_rec['b'] /= 1e6

In [9]: new_rec
Out[9]:
array([(1, 946688461.0), (2, 981079322.0)],
      dtype=[('a', '<i4'), ('b', '<f8')])

In [10]: f = tb.open_file('foo.h5', 'w')  # New PyTables file

In [11]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0),
   ....:                                             'b': tb.Time64Col(pos=1)})

In [12]: d.append(new_rec)

In [13]: d[:]
Out[13]:
array([(1, 946688461.0), (2, 981079322.0)],
      dtype=[('a', '<i4'), ('b', '<f8')])

In [14]: r = d[:]

In [15]: r['b'] *= 1e6

In [16]: r.astype([('a', 'i4'), ('b', 'datetime64[us]')])
Out[16]:
array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)),
       (2, datetime.datetime(2001, 2, 2, 2, 2, 2))],
      dtype=[('a', '<i4'), ('b', '<M8[us]')])

在PyTables中存储和提取numpy日期时间

到目前为止我尝试了什么

设置

创建数据

使用`Time64Col`描述符

将NumPy数据附加到PyTables数据集

我的约会对象发生了什么变化？

我的问题

常见答案

1 个答案:

在PyTables中存储和提取numpy日期时间

到目前为止我尝试了什么

设置

创建数据

使用Time64Col描述符

将NumPy数据附加到PyTables数据集

我的约会对象发生了什么变化？

我的问题

常见答案

1 个答案:

使用`Time64Col`描述符