我有两个numpy ndarray-每个都有自己的时间戳维度。我想将它们合并在一起。但是,它们的时间戳记的间隔不一定相同。这是我的意思的示例:
Array 1: names = ['timestamp', 'value']
a1 = [(1531000000, 0), (1532000000, 1), (1533000000, 2), (1534000000, 3)]
Array 2: names = ['timestamp', 'color']
a2 = [(1531500000, "blue"), (1532000000, "black"), (1533500000, "green"), (1534000000, "red")]
Resulting Array: names = ['timestamp', 'value', 'color']
a3 = [(1531000000, 0, nan), (1531500000, nan, "blue"), (1532000000, 1, "black"), (1533000000, 2, nan), (1533500000, nan, "green"), (1534000000, 3, "red")]
答案 0 :(得分:2)
使用Pandas,您可以执行外部合并,然后进行排序。这是很自然的,因为在熊猫框架中使用了NumPy数组。
import pandas as pd
res = pd.merge(df1, df2, how='outer').sort_values('timestamp').values.tolist()
结果
[[1531000000, 0.0, nan],
[1531500000, nan, 'blue'],
[1532000000, 1.0, 'black'],
[1533000000, 2.0, nan],
[1533500000, nan, 'green'],
[1534000000, 3.0, 'red']]
设置
names = ['timestamp', 'value']
a1 = [(1531000000, 0), (1532000000, 1), (1533000000, 2), (1534000000, 3)]
df1 = pd.DataFrame(a1, columns=names)
names = ['timestamp', 'color']
a2 = [(1531500000, "blue"), (1532000000, "black"), (1533500000, "green"), (1534000000, "red")]
df2 = pd.DataFrame(a2, columns=names)
答案 1 :(得分:1)
设置
看起来 就像您在这里显示结构化数组一样,因此我假设您正在使用它们。如果不使用结构化数组,则应该这样做,在这种情况下,您可以像这样创建它们:
a1 = np.array(a1, dtype=[('timestamp', int), ('value', int)])
a2 = np.array(a2, dtype=[('timestamp', int), ('color', '<U5')])
现在,您可以在此处使用numpy.lib.recfunctions
:
import numpy.lib.recfunctions as recfunctions
out = recfunctions.join_by('timestamp', a1, a2, jointype='outer')
masked_array(data=[(1531000000, 0, --), (1531500000, --, 'blue'),
(1532000000, 1, 'black'), (1533000000, 2, --),
(1533500000, --, 'green'), (1534000000, 3, 'red')],
mask=[(False, False, True), (False, True, False),
(False, False, False), (False, False, True),
(False, True, False), (False, False, False)],
fill_value=(999999, 999999, 'N/A'),
dtype=[('timestamp', '<i4'), ('value', '<i4'), ('color', '<U5')])
输出看起来有些混乱,但这只是np.ma.masked_array
的表示形式的样子。很容易看到这是正确的输出:
out.tolist()
[(1531000000, 0, None),
(1531500000, None, 'blue'),
(1532000000, 1, 'black'),
(1533000000, 2, None),
(1533500000, None, 'green'),
(1534000000, 3, 'red')]
但是,使用带掩码的数组,您可以访问大量实用程序函数来正确填写缺少的值。