从字典表单中随机选择数据

时间:2014-04-23 09:19:59

标签: python numpy scipy

import numpy as np
data= np.array([[0,1,2,3,4,7,6,7,8,9,10], 
        [10,3,10,4,7,7,7,8,11,12,11],  
        [10,10,3,5,7,7,7,9,11,11,11],
        [3,4,3,6,7,7,7,10,11,11,11],
        [4,5,6,7,7,9,10,11,11,11,11]], dtype='float')

my_groups = ['Group_A', 'Group_B', 'Group_C']
my_values = [7, 10, 11]
my_data ={}
for x, y in zip(my_groups, my_values):
    my_data[x] = np.where(data==y)
print my_data

#{'Group_C': (array([1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4], dtype=int64), array([ 8, 10,  8,  9, 10,  8,  9, 10,  7,  8,  9, 10], dtype=int64)), 'Group_B': (array([0, 1, 1, 2, 2, 3, 4], dtype=int64), array([10,  0,  2,  0,  1,  7,  6], dtype=int64)), 'Group_A': (array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4], dtype=int64), array([5, 7, 4, 5, 6, 4, 5, 6, 4, 5, 6, 3, 4], dtype=int64))}

现在,我想为每个组随机选择三个索引位置并将它们放入字典形式:

samples = {}
for x,y in zip(my_groups,my_data):
    for i in np.random.choice(len(my_data), 3, replace=True):
        samples[x] = np.array(my_data[x][i], dtype=np.int64)

print samples

寻找好主意。我无法使样品正常工作。

编辑:这是为了进行一致性检查:

import numpy as np
data= np.array([[0,1,2,3,4,7,6,7,8,9,10], 
        [10,3,10,4,7,7,7,8,300,12,11],  
        [300,10,100,5,7,7,7,9,200,11,11],
        [3,4,3,6,7,200,7,100,11,11,11],
        [4,5,6,7,7,9,10,11,11,11,11]], dtype='float')

my_groups = ['Group_A', 'Group_B', 'Group_C']
my_values = [100, 200, 300]
my_data ={}
for x,y in zip(my_groups, my_values):
    my_data[x] = np.where(data==y)
print my_data

samples = {}
for x,y in my_data.iteritems():
    idx_choice = np.random.choice(len(y[0]),2, replace=False)
    samples[x] = (y[0][idx_choice],y[1][idx_choice])
print samples

samples = {}
for x, y in my_data.iteritems():
    samples[x] = [(y[0][i],y[1][i]) for i in np.random.choice(len(y[0]),2,replace=False)]
print samples

1 个答案:

答案 0 :(得分:2)

如果我理解正确,您希望为您确定的每个组随机选择三对索引(由np.where返回)。

这可以很容易地完成,例如通过列表理解。

考虑一下

 samples = {}
 for x,y in my_data.iteritems():
     samples[x] = [(y[0][i],y[1][i]) for i in np.random.choice(len(y[0]),3)]

输出

 samples
 {'Group_A': [(2, 5), (3, 6), (1, 6)],
  'Group_B': [(3, 7), (3, 7), (2, 0)],
  'Group_C': [(2, 10), (2, 8), (2, 10)]}

修改 或者,您可能希望输出更接近np.where返回的输出。在这种情况下,你可以做

samples = {}
for x,y in my_data.iteritems():
    idx_choice = np.random.choice(len(y[0]),3)
    samples[x] = (y[0][idx_choice],y[1][idx_choice])

给出了

samples
{'Group_A': (array([0, 3, 3]), array([7, 4, 5])),
'Group_B': (array([1, 1, 2]), array([2, 2, 1])),
'Group_C': (array([2, 3, 2]), array([9, 8, 9]))}