Question

我有一个带有复合数据类型的 numpy ndarray，我想查看作为单数据类型 ndarray 的集合（这样我可以更有效地将它们放入 Pandas DataFrame）。例如

import numpy as np

raw = np.random.randn(10, 5)
dtype = [("a", "f8"), ("b", "f8"), ("c", "i8"), ("d", "i8"), ("e", "i4"), ("f", "i4")]
data = raw.view(dtype)

# ab, cd, ef are *almost* what I need
ab = data[["a", "b"]]
cd = data[["c", "d"]]
ef = data[["e", "f"]]

ab, cd, ef 几乎是我需要的，但这些仍然有复合 dtypes。我的直觉是尝试 ab.view("f8")，但这会返回所有 raw 的视图，而不仅仅是相关列。 cd.view("i8") 和 ef.view("i4") 相同。在非玩具示例中，这些 .views 只会提高 ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype

以下似乎有效，但似乎非常笨拙和脆弱：

alt = data.view("u1")

first_16_bytes = alt[:, :2*8]
ab2 = first_16_bytes.reshape(-1).view("f8").reshape(len(data), -1)

next_16_bytes = alt[:, 2*8: 4*8]
cd2 = next_16_bytes.reshape(-1).view("i8").reshape(len(data), -1)

last_8_bytes = alt[:, 4*8:]
ef2 = last_8_bytes.reshape(-1).view("i4").reshape(len(data), -1)

ab2, cd2, ef2 正是我所需要的，但肯定有更好的方法，对吗？

从结构化 ndarray 中提取连续的同构 dtype ndarray

0 个答案: