Question

我有一个如下的numpy数组

array([[ 6,  5],
   [ 6,  9],
   [ 7,  5],
   [ 7,  9],
   [ 8, 10],
   [ 9, 10],
   [ 9, 11],
   [10, 10]])

我想选择y坐标唯一的元素。如果两个y坐标相同，则我希望选择x坐标较小的元素。

预期产量

array([[ 6,  5],
   [ 6,  9],
   [ 8, 10],
   [ 9, 11]])

说明

在[6,5]上选择[7,5]

在[8,10]和[9,10]上选择[10,10]

选择[9, 11]

谢谢

Answer 1

首先，按第一列排序：

a = a[a[:, 0].argsort()]

使用带有{strong> return_index 标志的np.unique返回唯一索引：

a[np.unique(a[:, 1], return_index=True)[1]]

array([[ 6,  5],
       [ 6,  9],
       [ 8, 10],
       [ 9, 11]])

一些时间：

a = np.random.randint(1, 10, 10000).reshape(-1, 2)

In [45]: %timeit rows_by_unique_y(a)
3.83 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [46]: %timeit argsort_unique(a)
370 µs ± 8.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

是的，我的方法使用初始排序，但是在python中numpy beat迭代中进行了矢量化操作。

Answer 2

如果您愿意使用其他库，我建议您使用numpy_indexed作为有效且紧凑的解决方案

import numpy as np
import numpy_indexed as npi

a = np.array([[6, 5], [6, 9], [7, 5], [7, 9], [8, 10], [9, 10], [9, 11], [10, 10]])

column_to_groupby = 1
groups, reduced = npi.group_by(a[:,column_to_groupby]).min(a)
print(reduced)

它给出以下输出

[[ 6  5]
 [ 6  9]
 [ 8 10]
 [ 9 11]]

这是时间结果

In [5]: a = np.random.randint(1, 10, 10000).reshape(-1, 2)

In [6]: %timeit npi.group_by(a[:,1]).min(a)
354 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Answer 3

一种方法遍历数组并记下您看到的最佳值，然后在最后重建数组：

while (true)

无需排序，只需跟踪最小值即可。输出：

private IEnumerator IncrementSpeed ()
{
    //_startValue = 300f;
    //_endValue = 0f;
    //_changeDuration = 7f;
    focalLength = _startValue;
    changeRate = (_endValue - _startValue) / _changeDuration;

    while (focalLength >= _endValue)
    {
        focalLength += changeRate * Time.deltaTime;
        yield return null;
    }
}

虽然此答案渐近地更快，但user3483203's answer在实践中要好得多。这是因为它调用了优化的C代码，而不是停留在Python出奇的慢速解释器中。但是，如果您的数组巨大（几千兆字节），则O（n log n）行为将开始对此失去作用。

同时，如果您的数组是如此之大，则可能应该使用诸如Spark的MapReduce框架。我上面给出的算法很容易并行化。

如果您不需要最小的UniversalTransverseMercator utm = new UniversalTransverseMercator("T", 32, 233434, 234234); Coordinate c = UniversalTransverseMercator.ConvertUTMtoLatLong(utm);值，则使用import numpy as np def rows_by_unique_y(arr): best_for_y = defaultdict(lambda: float('inf')) for i, row in enumerate(arr): x,y = row[0], row[1] best_for_y[y] = min(x, best_for_y[y]) return np.array([[x,y] for y, x in best_for_y.items()]) arr = np.array([[6, 5], [6, 9], [7, 5], [7, 9], [8, 10], [9, 10], [9, 11], [10, 10]]) print(rows_by_unique_y(arr))的以下单线有效：

[[ 6  5]
 [ 6  9]
 [ 8 10]
 [ 9 11]]

但这会返回

如果您切换np.unique和arr[np.unique(arr[:,1], return_index=True)[1]]。

numpy删除重复的列值

3 个答案: