Question

如何通过分组变量拆分2D数组，并请返回数组列表（顺序也很重要）。

要显示预期的结果，R中的等价项可以做到

> (A = matrix(c("a", "b", "a", "c", "b", "d"), nr=3, byrow=TRUE)) # input
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 
[3,] "b"  "d" 
> (split.data.frame(A, A[,1])) # output
$a
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 

$b
     [,1] [,2]
[1,] "b"  "d"

编辑：澄清一下：我想根据第一列中的唯一值将数组{/ {1}}分成多个数组的列表。也就是说，将A拆分为第一列为A的一个数组和第一列为a的另一个数组。

我尝试过Python equivalent of R "split"-function，但这给出了三个数组

还有import numpy as np import itertools A = np.array([["a", "b"], ["a", "c"], ["b", "d"]]) b = a[:,0] def split(x, f): return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f))) split(A, b) ([array(['a', 'b'], dtype='<U1'), array(['a', 'c'], dtype='<U1'), array(['b', 'd'], dtype='<U1')], [])，也使用numpy.split，但需要整数。我虽然可以使用How to convert strings into integers in Python?将字母转换为整数，但是即使我传递整数，它也不会按预期分裂

np.split(A, b)

可以做到吗？谢谢

编辑：请注意，这只是一个小例子，组的数量可能大于两个，并且可能未订购。

Answer 1

您可以使用熊猫：

import pandas as pd
import numpy as np

a = np.array([["a", "b"], ["a", "c"], ["b", "d"]])

listofdfs = {}
for n,g in pd.DataFrame(a).groupby(0):
    listofdfs[n] = g

listofdfs['a'].values

输出：

array([['a', 'b'],
       ['a', 'c']], dtype=object)

然后

listofdfs['b'].values

输出：

array([['b', 'd']], dtype=object)

或者，您可以使用itertools groupby：

import numpy as np
from itertools import groupby
l = [np.stack(list(g)) for k, g in groupby(a, lambda x: x[0])]

l[0]

输出：

array([['a', 'b'],
       ['a', 'c']], dtype='<U1')

然后

l[1]

输出：

array([['b', 'd']], dtype='<U1')

Answer 2

如果我理解您的问题，则可以进行简单的切片，如下所示：

a = np.array([["a", "b"], ["a", "c"], ["b", "d"]])

x,y=a[:2,:],a[2,:]

x
array([['a', 'b'],
       ['a', 'c']], dtype='<U1')

y
array(['b', 'd'], dtype='<U1')

将数组拆分为数组列表

2 个答案:

您可以使用熊猫：

或者，您可以使用itertools groupby：