我不知道我希望做什么的确切技术术语,所以我将尝试用一个例子来演示:
我有两个相同长度的矢量, a 和 b ,如下所示:
In [41]:a
Out[41]:
array([ 0.61689215, 0.31368813, 0.47680184, ..., 0.84857976,
0.97026244, 0.89725481])
In [42]:b
Out[42]:
array([35, 36, 37, ..., 36, 37, 38])
a 包含N个浮点数, b 包含N个元素:具有10个不同值的键:35,36,37,...,43,44
我希望得到一个包含10列的新矩阵 M ,其中第一列包含 a 中的所有行,其中 b 中的对应键>是35. M 中的第二列包含 a 中的所有行, b 中的对应键为36.等等所有列都在列中10在 M 。
我希望这很清楚。谢谢
答案 0 :(得分:1)
itertools.groupby
可用于对值进行分组(排序后)。使用numpy
arrays
是可选的。
import numpy as np
import itertools
N=50
# a = np.random.rand(50)*100
a = np.random.randint(0,100,N) # int to make printing more compact
b = np.random.randint(35,45, N)
# make structured array to easily sort both arrays together
dtype = np.dtype([('a',float),('b',int)])
ab = np.ndarray(a.shape,dtype=dtype)
ab['a'] = a
ab['b'] = b
# ab = np.sort(ab,order=['b']) # sorts both 'b' and 'a'
I = np.argsort(b,kind='mergesort') # preserves order
ab = ab[I]
# now group, and extract lists of lists
gp = itertools.groupby(ab, lambda x: x['b'])
xx = [list(x[1]) for x in gp]
#print np.array([[y[0] for y in x] for x in xx]) # list of lists
def filled(x):
M = max(len(z) for z in x)
return np.array([z+[np.NaN]*(M-len(z)) for z in x])
print filled([[y[1] for y in x] for x in xx]).T
print filled([[y[0] for y in x] for x in xx]).T
制造
[[ 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.]
[ 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.]
[ nan 36. 37. nan 39. 40. 41. 42. 43. 44.]
[ nan 36. 37. nan 39. 40. 41. 42. 43. 44.]
...]
[[ 54. 69. 34. 28. 71. 53. 33. 19. 64. 56.]
[ 90. 52. 11. 9. 50. 53. 25. 37. 69. 56.]
[ nan 97. 31. nan 69. 35. 2. 80. 91. 54.]
[ nan 33. 87. nan 47. 90. 81. 45. 86. 57.]
...]
我正在使用argsort
和mergesort
来保留子列表中a
的顺序。 np.sort
对b
和a
进行词汇排序{与我对order
参数的期望相反)。
另一种方法是使用Python字典,也保留a
的顺序。它可能在大型阵列上较慢,但它隐藏的细节较少:
import collections
d = collections.defaultdict(list)
for k,v in zip(b,a):
d[k].append(v)
values = [d[k] for k in sorted(d.keys())]
print filled(values).T
答案 1 :(得分:0)
你可以使用pandas:
import numpy as np
import pandas as pd
a = np.random.rand(50)
b = np.random.randint(10, 15, 50)
s = pd.Series(a)
s.groupby(b).apply(pd.Series.reset_index, drop=True).unstack(level=0)
输出是:
10 11 12 13 14
0 0.465079 0.041393 0.692856 0.634328 0.179690
1 0.934678 0.746048 0.060014 0.072626 0.824729
2 0.388190 0.510527 0.078662 0.077157 0.291183
3 0.972033 0.761159 0.017317 0.104768 0.278871
4 0.750713 0.430246 0.083407 0.262037 0.487742
5 0.216965 0.482364 0.820535 0.207008 0.276452
6 0.282038 0.607303 0.675856 0.994369 0.602059
7 0.897106 0.398808 0.312332 0.751388 0.878177
8 0.229121 NaN NaN 0.061288 0.032066
9 0.810678 NaN NaN NaN 0.718237
10 0.571125 NaN NaN NaN 0.668292
11 0.410750 NaN NaN NaN 0.288145
12 0.984507 NaN NaN NaN NaN
答案 2 :(得分:0)
这是一种没有Pandas的方法(因此您需要手动跟踪列标签):
import numpy as np
from itertools import izip_longest
from collections import defaultdict
a = np.random.rand(50)
b = np.random.randint(10, 15, 50)
d = defaultdict(lambda:[])
for i, key_val in enumerate(b):
d[key_val].append(a[i])
output = np.asarray(list(izip_longest(*(d.values()),
fillvalue=np.NaN)))
print (a)
print (b)
print (output)
这给出了:
a
:
array([ 0.98688273, 0.95584584, 0.91011945, 0.56402919, 0.86185936,
0.09380343, 0.69290659, 0.97238284, 0.81297425, 0.73446398,
0.25927151, 0.44622982, 0.20537961, 0.61665218, 0.90168399,
0.58556404, 0.47017152, 0.32278718, 0.15044929, 0.07859976,
0.26715756, 0.38281878, 0.30169241, 0.47785937, 0.15377038,
0.93395325, 0.79099068, 0.92471442, 0.03154578, 0.0437627 ,
0.31711433, 0.78550517, 0.77062104, 0.76002167, 0.1842867 ,
0.52935392, 0.16038216, 0.46510856, 0.4311615 , 0.73923847,
0.45499238, 0.2630405 , 0.67722848, 0.1391463 , 0.50800704,
0.50618842, 0.19540159, 0.38150066, 0.82831838, 0.3383787 ])
b
:
array([14, 10, 13, 12, 12, 13, 13, 12, 11, 10, 10, 13, 14, 12, 11, 12, 14,
12, 12, 14, 11, 10, 13, 13, 13, 10, 14, 11, 13, 11, 11, 11, 12, 10,
11, 11, 14, 12, 12, 14, 13, 10, 11, 14, 13, 11, 10, 11, 12, 12])
output
:
array([[ 0.95584584, 0.81297425, 0.56402919, 0.91011945, 0.98688273],
[ 0.73446398, 0.90168399, 0.86185936, 0.09380343, 0.20537961],
[ 0.25927151, 0.26715756, 0.97238284, 0.69290659, 0.47017152],
[ 0.38281878, 0.92471442, 0.61665218, 0.44622982, 0.07859976],
[ 0.93395325, 0.0437627 , 0.58556404, 0.30169241, 0.79099068],
[ 0.76002167, 0.31711433, 0.32278718, 0.47785937, 0.16038216],
[ 0.2630405 , 0.78550517, 0.15044929, 0.15377038, 0.73923847],
[ 0.19540159, 0.1842867 , 0.77062104, 0.03154578, 0.1391463 ],
[ nan, 0.52935392, 0.46510856, 0.45499238, nan],
[ nan, 0.67722848, 0.4311615 , 0.50800704, nan],
[ nan, 0.50618842, 0.82831838, nan, nan],
[ nan, 0.38150066, 0.3383787 , nan, nan]])