数组拆分任务:基于值和自定义类型

时间:2018-06-01 01:31:39

标签: python python-3.x numpy data-science

有两个2-D ndarray s A和B.

A包含要素的面板值,行代表天数,列代表不同的区域。 A中有~3000列和~5000行。例如

A = array([[ 3.53, 3.56, nan, ..., nan, nan, nan], # day 1 data
   [-4.91, -2.54, nan, ..., nan, nan, nan], # day 2 data
   [-6.31, -3.39, nan, ..., nan, nan, nan], # day 3 data, etc
   ..., 
   [ 0.  , -3.41, nan, ..., 12.69, 2.32, nan],
   [-2.74, -4.14, nan, ..., -8.63, -1.45, nan],
   [-1.74, -7.45, nan, ..., 0.68, -6.52, nan]])

B包含A中对应的每个值的类型。总共有大约30种类型。如

B = array([[  'A', 'B', nan, ..., nan, nan, nan], # day 1 type
           [  'A', 'A', nan, ..., nan, nan, nan], # day 2 type, etc
           ...,
           [  'D', 'E', nan, ..., 'I', 'D', nan],
           [  'X', 'Y', nan, ..., 'O', 'S', nan]])

目标是每天(行),区域应根据值(组10>组9 ...)分成10组。对于每个组,每种类型的权重应等于total number of the type in the row / 10。例如,

 day 1: 
       # of A: 35    -->  weight of A in each group: 3.5
       # of B: 33    -->  weight of B in each group: 3.3
       ...
       # of Z: 6     -->  weight of Z in each group: 0.6 

结果应该是

weight_group_1 = array([[  1,  1,  nan, ..., 0.5, ..., 1, ..., nan, nan, nan]
# And the sum of each group's weights should be equal, if all steps correct.
weight_group_2 = array([[  0,  0,  nan, ..., 1, ..., 0.3, ..., nan, nan, nan]
and so on

有没有有效的算法来实现这一目标?请帮助,提前谢谢!

0 个答案:

没有答案