Question

我想执行操作

$form1$

如果 $form2$ 有一个常规形状，那么我可以使用np.einsum，我相信语法将是

np.einsum('ijp,ipk->ijk',X, alpha)

不幸的是，我的数据X在第1个（如果我们是零索引）轴上具有非常规结构。

为了给出更多的上下文， $form3$ 指的是第i个组的第j个成员的第p个特征。因为组具有不同的大小，所以有效地，它是具有相同长度的列表的不同长度的列表的列表。

$form4$ 具有常规结构，因此可以保存为标准的numpy数组（它是1维的，然后我使用alpha.reshape（a，b，c），其中a，b，c是特定问题的整数）

我想避免将X存储为列表列表或不同维度的np.arrays列表并编写类似

的内容

A = []
for i in range(num_groups):
    temp = np.empty(group_sizes[i], dtype=float)
    for j in range(group_sizes[i]):
        temp[i] = np.einsum('p,pk->k',X[i][j], alpha[i,:,:])
    A.append(temp)

这是一个很好的numpy函数/数据结构吗？或者我将不得不与一些仅部分矢量化的实现妥协？

Answer 1

我知道这听起来很明显，但是，如果你能负担得起内存，我只需要通过填充数据以获得统一的大小来检查你获得的性能，即简单地添加零并执行操作。有时，更简单的解决方案比具有更多Python / C往返的更优化的解决方案更快。

如果这不起作用，那么正如Tom Wyllie建议的那样，你最好的选择可能是一种策略。假设X是列表列表，alpha是一个数组，你可以先收集第二个索引的大小（也许你已经有了这个）：

X_sizes = np.array([len(x_i) for x_i in X])

并对它们进行排序：

idx_sort = np.argsort(X_sizes)
X_sizes_sorted = X_sizes[idx_sort]

然后你选择了一些桶，这是你工作的分割数。我们假设您选择BUCKETS = 4。您只需要划分数据，以便每个部分的大小相同或多或少：

sizes_cumsum = np.cumsum(X_sizes_sorted)
total = sizes_cumsum[-1]
bucket_idx = []
for i in range(BUCKETS):
    low = np.round(i * total / float(BUCKETS))
    high = np.round((i + 1) * total / float(BUCKETS))
    m = sizes_cumsum >= low & sizes_cumsum < high
    idx = np.where(m),
    # Make relative to X, not idx_sort
    idx = idx_sort[idx]
    bucket_idx.append(idx)

然后你为每个桶进行计算：

bucket_results = []
for idx in bucket_idx:
    # The last index in the bucket will be the biggest
    bucket_size = X_sizes[idx[-1]]
    # Fill bucket array
    X_bucket = np.zeros((len(X), bucket_size, len(X[0][0])), dtype=X.dtype)
    for i, X_i in enumerate(idx):
        X_bucket[i, :X_sizes[X_i]] = X[X_i]
    # Compute
    res = np.einsum('ijp,ipk->ijk',X, alpha[:, :bucket_size, :])
    bucket_results.append(res)

在此部分填充数组X_bucket可能会很慢。同样，如果你能负担得起内存，那么在单个填充数组中使用X然后切片X[idx, :bucket_size, :]会更有效。

最后，您可以将结果放回列表中：

result = [None] * len(X)
for res, idx in zip(bucket_results, bucket_idx):
    for r, X_i in zip(res, idx):
        result[X_i] = res[:X_sizes[X_i]]

很抱歉，我没有给出正确的功能，但我不确定您的输入或预期输出到底是什么，所以我只是把它们放在一起，您可以根据需要使用它们。

如何使用不规则的数组形状对numpy中的操作进行矢量化/张量化

1 个答案: