将原始数据插入到python中随机生成的数据中

时间:2018-03-08 15:23:58

标签: python python-3.x numpy

假设我有一个3d(10x3)np.array:

orgArr = [[  30.1678 -173.569   725.724 ]
 [  29.9895 -173.34    725.76  ]
 [  29.9411 -173.111   725.768 ]
 [  29.9306 -173.016   725.98  ]
 [  29.6754 -172.621   725.795 ]
 [  29.5277 -172.274   725.903 ]
 [  29.585  -171.978   726.111 ]
 [  29.4114 -171.507   726.188 ]
 [  29.3951 -170.947   726.173 ]
 [  29.3577 -170.196   726.384 ]]

对于每一列,我生成rondom数,在各列的最小值和最大值之间,例如,对于第一列:

# Find min/max
colXMin = np.min(orgArr[:, 0])
colXMax = np.max(orgArr[:, 0])

# Generate random number between min/max
size = 12
addRandomToColX = self.create_random_floats(colXMin, colXMax, size)

# Sort the random numbers
sortRandomColX= sorted(addRandomToColX, reverse= True)
print('sortRandomColX:', sortRandomColX)

# Do same for cols y and z
...

# Create 3d array finally
randomArr = np.array([sortRandomColX, sortRandomColY, sortRandomColZ]).T
    print('randomArr:', randomArr)


def create_random_floats(low, high, size):
    return [random.uniform(low, high) for _ in range(size)]

所以我得到12x3数组,按照给定的顺序排序。 col x减小,但y和z增加:

randomArr: 
[[  30.16564103 -173.45321119  725.74404996]
 [  30.03986524 -173.17110927  725.84951132]
 [  29.97088507 -173.15435901  725.85341553]
 [  29.79273295 -172.76247176  725.97347288]
 [  29.53294671 -170.90169722  726.27944054]
 [  29.53182418 -170.88261603  726.34089036]
 [  29.52163245 -170.72931883  726.34411865]
 [  29.50194557 -170.71866152  726.34946239]
 [  29.45834997 -170.68671434  726.36413176]
 [  29.4426014  -170.57381107  726.37110357]
 [  29.43702889 -170.40826716  726.45476367]
 [  29.3621429  -169.77240546  726.51968671]]

如何在orgArr数据中随机重新插入/混合randomArr并分发整个大小?我的意思是不在randomArr的开头或结尾。否则,单个列的排序顺序将被破坏。

1 个答案:

答案 0 :(得分:2)

一种解决方案,主要依靠numpy。但是,对于反转列顺序的部分并不满意。但这不会将数组作为参数。

import numpy as np
orgArr = np.asarray([[  30.1678, -173.569,   725.724 ],
                     [  29.9895, -173.34,    725.76  ],
                     [  29.9411, -173.111,   725.768 ],
                     [  29.9306, -173.016,   725.98  ],
                     [  29.6754, -172.621,   725.795 ],
                     [  29.5277, -172.274,   725.903 ],
                     [  29.585,  -171.978,   726.111 ],
                     [  29.4114, -171.507,   726.188 ],
                     [  29.3951, -170.947,   726.173 ],
                     [  29.3577, -170.196,   726.384 ]])
#number of rows to add
n2add = 12
#min/max for each column
orgMin = np.min(orgArr, axis = 0)
orgMax = np.max(orgArr, axis = 0)
#generate array with random values between min/max of each column 
randomArr = (orgMax - orgMin) * np.random.random((n2add + orgArr.shape[0], orgArr.shape[1])) + orgMin
#insert original values
randomArr[:orgArr.shape[0], :] =  orgArr
#sort values
randomArr.sort(axis = 0)
#determines for each column, if direction of order in orgArr is the same as in randomArr
#and reverses column order, if not
col_ord = np.sign((orgArr[0,:] - orgArr[-1,:])) * np.sign((randomArr[0,:] - randomArr[-1,:]))
for i in range(orgArr.shape[1]):
    if col_ord[i] < 0:
        randomArr[:,i] = randomArr[::-1,i]

示例输出:

#randomArr
[[  30.1678     -173.569       725.724     ]
 [  30.11384713 -173.34        725.76      ]
 [  30.02906243 -173.23713466  725.768     ]
 [  29.9895     -173.111       725.795     ]
 [  29.94555434 -173.016       725.83462631]
 [  29.9411     -172.78230979  725.903     ]
 [  29.9306     -172.6898037   725.95312697]
 [  29.92622676 -172.621       725.98      ]
 [  29.91989733 -172.44033232  726.01484565]
 [  29.91581341 -172.42239247  726.08304636]
 [  29.89624414 -172.30021976  726.08525885]
 [  29.84977922 -172.29533928  726.08784464]
 [  29.80493116 -172.274       726.10620276]
 [  29.6754     -172.03366934  726.111     ]
 [  29.63979452 -171.978       726.14750753]
 [  29.585      -171.67822537  726.1535495 ]
 [  29.5277     -171.507       726.173     ]
 [  29.49315771 -171.33446469  726.18671858]
 [  29.42592778 -171.15097712  726.188     ]
 [  29.4114     -170.947       726.24372921]
 [  29.3951     -170.87844982  726.29369897]
 [  29.3577     -170.196       726.384     ]]

您还可以使用此脚本使用其他分发功能,numpyscipy提供各种各样的功能。例如。对于np.random.normal(mu, sigma, n)

randomArr = (orgMax - orgMin) * np.random.normal(0.1, 0.001, (n2add + orgArr.shape[0], orgArr.shape[1])) + orgMin

如果您现在查看输出数组,您会注意到可以在数组的一端找到新生成的值(由更多数字表示)。但请注意不要超过您的分配功能的限制(0,1)。示例mu = 0.1 sigma = 0.001极端显示对最终分布的影响。 0.5/0.2可以正常使用,但您不会收到0.5/5的错误消息,该消息将超出初始范围。