Question

我有一个大型数据集，我想为其实现一个高效的numpy解决方案。作为一个更简单的例子，考虑一小组数字。

import numpy as np 
arr = np.linspace(1, 10, 10)

下面的代码非常接近我的理想解决方案，但我遇到了障碍。首先，我创建一个布尔掩码来指示数组的索引，在该索引处，数组值大于预定义的下限且小于预定义的上限。然后我将布尔掩码拆分为子数组，每个子数组由连续索引的相同值组成。例如，[0, 0, 0, 1, 1, 0, 0, 1, 1, 1]分为[0, 0, 0], [1, 1], [0, 0], [1, 1, 1]。最后，我想采用仅由1组成的所有子阵列，并将它们分成单独的子阵列。例如，[1, 1, 1]应该分为[1], [1], [1]。

以下代码完成了我想要的大部分内容，但却不方便。我希望所有子数组都存储在一个数组中，我可以从中计算子数组的数量和每个子数组中元素的数量。不幸的是，这对我来说很棘手，因为函数输出是numpy数组，是array(...)而不是(...)。我认为有一种方法可以使用np.ndarray.T来获取True / False的{{1}} / axis值，并应用def get_groups_by_difference(array, difference): """ This function splits arrays into subarrays in which every element is identical. """ return np.split(array[:], np.where(abs(np.diff(array)) != difference)[0] + 1) def check_consecutive_nested_arrays(array, repeated_value): """ This function returns a boolean array mask - True if all elements of a subarray contain the repeated value; False otherwise. """ return np.array([np.all(subarray == repeated_value) for subarray in array]) def get_solution(array, lbound, ubound): # get boolean mask for array values within bounds bool_cnd = np.logical_and(array>lbound, array<ubound) # convert True/False into 1/0 bool_cnd = bool_cnd * 1 # split array into subarrays of identical values by consecutive index stay_idx = np.array(get_groups_by_difference(bool_cnd, 0)) # find indices of subarrays of ones bool_chk = check_consecutive_nested_arrays(stay_idx, 1) # get full subarrays of ones ones_sub = stay_idx[bool_chk] return bool_cnd, stay_idx, bool_chk, ones_sub bool_cnd, stay_idx, bool_chk, ones_sub = get_solution(arr, 3, 7) print(bool_cnd) >> [0 0 0 1 1 1 0 0 0 0] print(stay_idx) >> [array([0, 0, 0]) array([1, 1, 1]) array([0, 0, 0, 0])] print(bool_chk) >> [False True False] print(ones_sub) >> [array([1, 1, 1])] kwarg，尽管我有到目前为止，没有成功实施这种方法。我该如何简化这个过程？

[[0 0 0]
[1]
[1]
[1]
[0 0 0 0]]

我的目标是获得如下的数组结果：

这样，我可以找到每个子阵列的元素数量和子阵列的数量（即[3, 1, 1, 1, 4]子阵列，长度为coefficient。

Answer 1

然后你无法处理你的结果：

ret = []
for idx, check in zip(stay_idx, bool_chk):
    if check:
        ret += idx.tolist()
    else:
        ret.append(idx)
ret = np.array(ret)

不是特别漂亮，但可能足以满足您的特定需求。

Answer 2

如果我理解正确的话，

np.split(a, 1 + np.where(a[1:]|a[:-1])[0])

应该做你想做的事。这里a是1和0的向量。

这使用了这样一个事实，即你可以通过分割每个 1的左右来获得最终结果。

如何简化分割/计数numpy子阵列的过程？

2 个答案: