我有一个像这样的数组,想返回值超过0.6的阈值的每一行的列号:
X = array([[ 0.16, 0.40, 0.61, 0.48, 0.20],
[ 0.42, 0.79, 0.64, 0.54, 0.52],
[ 0.64, 0.64, 0.24, 0.63, 0.43],
[ 0.33, 0.54, 0.61, 0.43, 0.29],
[ 0.25, 0.56, 0.42, 0.69, 0.62]])
结果将是:
[[2],
[1, 2],
[0, 1, 3],
[2],
[3, 4]]
有没有比双循环更好的方法了?
def get_column_over_threshold(data, threshold):
coolumn_numbers = [[] for x in xrange(0,len(data))]
for sample in data:
for i, value in enumerate(data):
if value >= threshold:
coolumn_numbers[i].extend(i)
return topic_predictions
答案 0 :(得分:1)
使用np.where
获取行索引和col索引,然后使用带有np.split
的索引获取列索引的列表作为数组输出-
In [18]: r,c = np.where(X>0.6)
In [19]: np.split(c,np.flatnonzero(r[:-1] != r[1:])+1)
Out[19]: [array([2]), array([1, 2]), array([0, 1, 3]), array([2]), array([3, 4])]
为了使其更通用,可以处理没有任何匹配的行,我们可以遍历从np.where
获得的列索引,然后将其分配给初始化数组,就像这样-
def col_indices_per_row(X, thresh):
mask = X>thresh
r,c = np.where(mask)
out = np.empty(len(X), dtype=object)
grp_idx = np.r_[0,np.flatnonzero(r[:-1] != r[1:])+1,len(r)]
valid_rows = r[np.r_[True,r[:-1] != r[1:]]]
for (row,i,j) in zip(valid_rows,grp_idx[:-1],grp_idx[1:]):
out[row] = c[i:j]
return out
样品运行-
In [92]: X
Out[92]:
array([[0.16, 0.4 , 0.61, 0.48, 0.2 ],
[0.42, 0.79, 0.64, 0.54, 0.52],
[0.1 , 0.1 , 0.1 , 0.1 , 0.1 ],
[0.33, 0.54, 0.61, 0.43, 0.29],
[0.25, 0.56, 0.42, 0.69, 0.62]])
In [93]: col_indices_per_row(X, thresh=0.6)
Out[93]:
array([array([2]), array([1, 2]), None, array([2]), array([3, 4])],
dtype=object)
答案 1 :(得分:1)
对于每一行,您可以要求元素大于0.6的索引:
result = [where(row > 0.6) for row in X]
这将执行您想要的计算,但是result
的格式有点不方便,因为在这种情况下where
的结果是大小为1的tuple
,包含NumPy数组与索引。我们可以将where
替换为flatnonzero
以直接获取数组而不是元组。为了获得列表列表,我们将这个数组显式转换为列表:
result = [list(flatnonzero(row > 0.6)) for row in X]
(在上面的代码中,我假设您使用过from numpy import *
)