有两个2-D ndarray
s A和B.
A
包含要素的面板值,行代表天数,列代表不同的区域。 A中有~3000列和~5000行。例如
A = array([[ 3.53, 3.56, nan, ..., nan, nan, nan], # day 1 data
[-4.91, -2.54, nan, ..., nan, nan, nan], # day 2 data
[-6.31, -3.39, nan, ..., nan, nan, nan], # day 3 data, etc
...,
[ 0. , -3.41, nan, ..., 12.69, 2.32, nan],
[-2.74, -4.14, nan, ..., -8.63, -1.45, nan],
[-1.74, -7.45, nan, ..., 0.68, -6.52, nan]])
B
包含A中对应的每个值的类型。总共有大约30种类型。如
B = array([[ 'A', 'B', nan, ..., nan, nan, nan], # day 1 type
[ 'A', 'A', nan, ..., nan, nan, nan], # day 2 type, etc
...,
[ 'D', 'E', nan, ..., 'I', 'D', nan],
[ 'X', 'Y', nan, ..., 'O', 'S', nan]])
目标是每天(行),区域应根据值(组10>组9 ...)分成10组。对于每个组,每种类型的权重应等于total number of the type in the row / 10
。例如,
day 1:
# of A: 35 --> weight of A in each group: 3.5
# of B: 33 --> weight of B in each group: 3.3
...
# of Z: 6 --> weight of Z in each group: 0.6
结果应该是
weight_group_1 = array([[ 1, 1, nan, ..., 0.5, ..., 1, ..., nan, nan, nan]
# And the sum of each group's weights should be equal, if all steps correct.
weight_group_2 = array([[ 0, 0, nan, ..., 1, ..., 0.3, ..., nan, nan, nan]
and so on
有没有有效的算法来实现这一目标?请帮助,提前谢谢!