在每一行上随机选择一个元素

时间:2016-03-31 10:18:10

标签: python pandas

假设我的pandas数据框有3列,如下所示:

     col1     col2     col3
0  banana1  banana2  banana2
1   apple1   apple2   apple3
2  monkey1  monkey2  monkey3
3  iphone1  iphone2  iphone3
4  runner1  runner2  runner3
5     pig1     pig2     pig3
6    wifi1    wifi2    wifi3
7    girl1    girl2    girl3
8     boy1     boy2     boy3
9  couple1  couple2  couple3

如何在每一行中随机选择3个元素中的1个,并将其附加到新数据帧,我希望它循环N次然后继续并在新行上追加3个元素中的1个并循环N次?

这有点难以解释,所以我将通过一个例子来解释:

import pandas as pd

data = {'col1': ['banana1', 'apple1', 'monkey1', 'iphone1', 'runner1', 'pig1', 'wifi1', 'girl1', 'boy1', 'couple1'],
        'col2': ['banana2', 'apple2', 'monkey2', 'iphone2', 'runner2', 'pig2', 'wifi2', 'girl2', 'boy2', 'couple2'],
        'col3': ['banana2', 'apple3', 'monkey3', 'iphone3', 'runner3', 'pig3', 'wifi3', 'girl3', 'boy3', 'couple3']}
df = pd.DataFrame(data, columns=['col1', 'col2' , 'col3'])

所以我想做的是为每一行随机选择item1item2item3,并将其附加到新数据帧中的新行,当10'时选择了这个项目我希望它重新开始N次,然后转到新数据帧中的新行并循环N次。最终得到这样的东西(随机性):

    1       2      3       4       5       6    7     8     9    10       11      12     13      14      15      16   17    18    19   20
    banana3 apple2 monkey1 iphone2 runner2 pig1 wifi2 girl3 boy1 couple1  banana1 apple2 monkey2 iphone3 runner3 pig3 wifi2 girl1 boy1 couple3
    ........................................................................................................................................... 
    ...........................................................................................................................................
    ...........................................................................................................................................
    banana1 apple2 monkey2 iphone3 runner1 pig2 wifi3 girl1 boy3 couple2  banana2 apple1 monkey2 iphone2 runner2 pig1 wifi2 girl3 boy1 couple2

在此输出中,我在每行上选择1/3的循环将其循环2次到新数据帧中的N行。

我的尝试:

我喜欢通过一个函数来完成它,它会根据n和N给出我想要的输出。

new_df = []

def rand_element_selection(n,N):
    for row in df.iterrows: 
        element_holder = df.sample(1)
        new_df.append(placeholder)
上面没有定义

nN因为我正在努力向前发展..

2 个答案:

答案 0 :(得分:1)

IIUC您可以致电sample上的axis=1并转置:

In [172]:
n=3
N=2
df_list=[]
for i in range(n):
    df_list.append(pd.concat([df.sample(1, axis=1).T.reset_index(drop=True) for j in range(N)], axis=1, ignore_index=True))
pd.concat(df_list, ignore_index=True)    

Out[172]:
        0       1        2        3        4     5      6      7     8   \
0  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3  boy3   
1  banana2  apple2  monkey2  iphone2  runner2  pig2  wifi2  girl2  boy2   
2  banana2  apple2  monkey2  iphone2  runner2  pig2  wifi2  girl2  boy2   

        9        10      11       12       13       14    15     16     17  \
0  couple3  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3   
1  couple2  banana1  apple1  monkey1  iphone1  runner1  pig1  wifi1  girl1   
2  couple2  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3   

     18       19  
0  boy3  couple3  
1  boy1  couple1  
2  boy3  couple3  

答案 1 :(得分:0)

连接主要来自EdChum's answer

n=3
N=2
df_list=[]
for i in range(n):
    df_list.append(pd.concat([df.apply(np.random.choice, axis=1) for i in range(N)], ignore_index=True))
new_df = pd.concat(df_list, axis=1, ignore_index=True).T