Question

我有一个字典，键是 str，值是 np.array。 np.array 的类型也可以是 np.str_ 类型。

data = {"col1": np.array((1, 2, 3, 4, 5, 4, 3, 2, 1)),
        "col2": np.array(list("abcdeabcd")),
        "col3": np.array((10, 11, 9, 8, 7, 2, 12, 100, 1))}

如何按多个键和升序/降序排序，类似于我可以使用 pandas 的 sort_values 方法：

Pandas 解决方案（不需要）

df = pd.DataFrame(data)
df.sort_values(by=["col1", "col2"], ascending=[True, True])

需要 Numpy 或基本 Python 解决方案：

我不想使用熊猫，但最好使用 numpy 中的某些东西。我知道我可以使用 np.lexsort 按多列排序。但这并没有给我 (i) 升/降的选项。

这是我在按 col1 然后按升序/升序排序 col2 时想要的输出：

{'col1': np.array([1, 1, 2, 2, 3, 3, 4, 4, 5]),
 'col2': np.array(['a', 'd', 'b', 'c', 'b', 'c', 'a', 'd', 'e']),
 'col3': np.array([10, 1, 11, 100, 12, 9, 2, 8, 7])}

这是我在按 col1 和 col2 按升序/降序排序时想要的输出：

{'col1': np.array([1, 1, 2, 2, 3, 3, 4, 4, 5]),
 'col2': np.array(['d', 'a', 'c', 'b', 'c', 'b', 'd', 'a', 'e']),
 'col3': np.array([1, 10, 100, 11, 9, 12, 8, 2, 7])}

Answer 1

您可以使用 np.lexsort 通过否定排序键对列进行向后排序。请记住，您不需要将实际数组传递给 lexsort。对于带符号的数值数组，您可以通过取反值来反向排序。对于无符号整数，您可以从最大值中减去这些值。字符串既可以被视为数字，也可以根据 np.unique 制作有符号整数的查找表。

这是一个数组的小例子：

np.random.seed(0xBEEF)
a = np.random.choice([1, 2, 3], 10)
b = np.random.choice([1.0, 2.0, 3.0], 10)
c = np.random.choice(np.array([1, 2, 3], dtype=np.uint8), 10)
d = np.random.choice(list('abc'), 10)

升序排序键在所有情况下都可以是数组本身。在降序中，我们显然可以使用 -a 和 -b。碰巧，-c 也有效：

>>> c
array([3, 1, 1, 3, 2, 2, 1, 1, 2, 3], dtype=uint8)
>>> -c
array([253, 255, 255, 253, 254, 254, 255, 255, 254, 253]

这可能取决于平台表示，但在大多数流行系统上，负数以二进制补码形式表示，这应该可以正常工作。如果你想真正安全，你可以添加一个像

这样的检查

if np.issubdtype(c.dtype, np.unsignedinteger):
    key = np.iinfo(c.dtype).max + 1 - c

当然还有d：

>>> d
array(['c', 'a', 'b', 'b', 'c', 'b', 'a', 'b', 'c', 'b'], dtype='<U1')
>>> -d
...
UFuncTypeError: ufunc 'negative' did not contain a loop with signature matching types dtype('<U1') -> dtype('<U1')

这里构造排序键的一种方法是：

lookup, key = np.unique(d, return_inverse=True)

key 的元素是 lookup 的索引，按排序顺序，这意味着如果对 key 进行排序，lookup[key] 的结果将正确排序为好。这意味着 key.argsort() 和 d.argsort() 是相同的，并且具有可以否定 key 的附加优势。

事实上，你可以走捷径，单独使用这种技术来编写你的密钥生成器：

def make_key(arr, asc=True):
    _, key = np.unique(arr, return_inverse=True)
    if not asc:
        key = np.negative(key, out=key) # Don't bother making a second array
    return key

所以您的完整示例可能如下所示：

def custom_lexsort(arrs, asc=True):
    """
    Lexsort a collection of arrays in ascending or descending order.

    Parameters
    ----------
    arrs : sequence[array-like]
        Sequence of arrays to sort.
    asc : array-like[bool]
        Sequence of True for ascending elements of `keys`,
        False for descending. Must broadcast to `(len(arrs),)`.
    """
    def make_key(a, asc):
        if np.issubdtype(a.dtype, np.number):
            key = a
        else:
            _, key = np.unique(a, return_inverse=True)
        if asc:
            return key
        elif np.issubdtype(key.dtype, np.unsignedinteger):
            return np.iinfo(key.dtype).max + 1 - key
        else:
            return -key

    n = len(arrs)
    asc = np.broadcast_to(asc, n)
    keys = [make_key(*x) for x in zip(arrs, asc)]
    return np.lexsort(keys[::-1])

data = {"col1": np.array((1, 2, 3, 4, 5, 4, 3, 2, 1)),
        "col2": np.array(list("abcdeabcd")),
        "col3": np.array((10, 11, 9, 8, 7, 2, 12, 100, 1))}

idx = custom_lexsort(list(data.values()), [True, False, True])
result = {k: v[idx] for k, v in data.items()}

我冒昧地颠倒了数组的顺序，因为 lexsort 从最后到第一排序。果然：

>>> result
{'col1': array([1, 1, 2, 2, 3, 3, 4, 4, 5]),
 'col2': array(['d', 'a', 'c', 'b', 'c', 'b', 'd', 'a', 'e'], dtype='<U1'),
 'col3': array([  1,  10, 100,  11,   9,  12,   8,   2,   7])}

我已经包含了用于排序的第三列，因为它没有害处。下面是一个数组就地排序的例子，只有前两个用于排序：

idx = custom_lexsort([data['col1'], data['col2']], [True, False])
for v in data.values():
    v[:] = v[idx]

字典：按多个键+降序/升序排序

1 个答案: