Fisher Yates Shuffle in python

时间:2016-12-04 17:35:53

标签: python shuffle

我有以下数据集(这是一个示例):

ID      Sub1    Sub2    Sub3    Sub4
Creb3l1 10.14   9.67    10.14   10.42
Chchd6  11.25   10.74   10.80   11.07
Arih1   9.91    9.25    10.20   9.34
Prpf8   11.54   11.58   11.14   11.36
Rfng    11.71   11.56   10.81   10.72
Rnf114  12.66   12.60   12.59   12.56

我想在这个数据集上进行10次Fisher-Yates shuffle(即写入10个输出文件,每个文件使用Fisher Yates shuffle对数据进行一次随机化)。

我写了这段代码:

import sys
import itertools
from itertools import permutations

for line in open(sys.argv[1]).readlines()[2:]:
    line = line.strip().split()
    ID = line[0]
    expression_values = line[1:]
    for shuffle in permutations(expression_values):
        print shuffle

此代码的输出如下(示例):

('11.25', '10.74', '10.80', '11.07')
('11.25', '10.74', '11.07', '10.80')
('11.25', '10.80', '10.74', '11.07')
('11.25', '10.80', '11.07', '10.74')
('11.25', '11.07', '10.74', '10.80')
('11.25', '11.07', '10.80', '10.74')
('10.74', '11.25', '10.80', '11.07')
('10.74', '11.25', '11.07', '10.80')
('10.74', '10.80', '11.25', '11.07')
('10.74', '10.80', '11.07', '11.25')
('10.74', '11.07', '11.25', '10.80')
('10.74', '11.07', '10.80', '11.25')
('10.80', '11.25', '10.74', '11.07')
('10.80', '11.25', '11.07', '10.74')
('10.80', '10.74', '11.25', '11.07')
('10.80', '10.74', '11.07', '11.25')
('10.80', '11.07', '11.25', '10.74')
('10.80', '11.07', '10.74', '11.25')
('11.07', '11.25', '10.74', '10.80')
('11.07', '11.25', '10.80', '10.74')
('11.07', '10.74', '11.25', '10.80')
('11.07', '10.74', '10.80', '11.25')
('11.07', '10.80', '11.25', '10.74')
('11.07', '10.80', '10.74', '11.25')
('9.91', '9.25', '10.20', '9.34')
('9.91', '9.25', '9.34', '10.20')

我遇到麻烦的具体部分是产生随机数据块(例如,给我一块7个Fisher-Yates随机线,我可以写入文件)。如果有人可以告诉我如何编辑上面的代码来生成10个输出文件,每个文件包含7行文本(即与输入文件的编号相同),每个文件都随机化Fisher Yates洗牌的值集,我会很感激它

编辑1:我尝试了几种不同的方法: 例如这段代码:

for line in open(sys.argv[1]).readlines()[2:]:
    line = line.strip().split()
    gene_name = line[0]
    expression_values = line[1:]
    RandomList = []
    for shuffle in permutations(expression_values):
        while len(RandomList) <10:                                                                                                                                                                
            RandomList.append(shuffle)                                                                                                                                                            
    print RandomList                                                                                                                                                                                

我想我会给每回10个随机数。它给了我相同的随机线,每行10次:

[('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07')]
[('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34')]
[('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36')]
[('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72')]
[('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56')]

编辑2:肖恩:非常感谢你的帮助,所以我实际上知道如何写文件,例如我可以说:

for i in range(10):
   output_file = "random." + str(i)
   open_output_file = open(output_file, 'a')
   ***for each line of the randomised array***:
        open_output_file.write(line + "\n")
   open_output_file.close()

我写入文件的问题是我甚至无法获得我想要首先打印到屏幕的内容,例如,如果我运行此代码:

   import sys
   import itertools
   from itertools import permutations

   for i in range(10):
        for line in open(sys.argv[1]).readlines()[2:]:
            line = line.strip().split()
            gene_name = line[0]
            expression_values = line[1:]
            for shuffle in permutations(expression_values):
                print shuffle[:6]
            print "***"
    i +=1

我希望输出类似于7个随机行,然后是“***”,然后是7个随机行,10次。但是它会打印每行的所有组合。

3 个答案:

答案 0 :(得分:0)

我想我有一个解决方案:

import sys
import itertools
from itertools import permutations
import os

#Write the header line to 10 random files
fileopen = open(sys.argv[1]).readlines()
for i in range(10):
     file_name = "random" + str(i) + ".txt"
     open_file_name = open(file_name, 'a')
     open_file_name.write(fileopen[0].strip() + "\n")

#Write the rest of the info to 10 random files
for line in fileopen:
     if "Sub" not in line:
          line = line.strip().split()
          ID = line[0]
          expression_values = line[1:]
          ListOfShuffles = permutations(expression_values)
          for ind,i in enumerate(list(ListOfShuffles)[0:10]):
               file_name = "random" + str(ind) + ".txt"
               open_file_name = open(file_name, 'a')
               open_file_name.write(ID + "\t" + "\t".join(i) + "\n")

答案 1 :(得分:0)

import random

def shuffle(ary):
  a=len(ary)
  b=a-1
  for d in range(b,0,-1):
    e=random.randint(0,d)
    if e == d:
        continue
    ary[d],ary[e]=ary[e],ary[d]
  return ary

fisher-yates shuffler从列表中获取随机值,并将其放在第一个位置。它将重复给定数组的长度。 对于每次迭代,它将生成range(len(remaining_elemnets),0)中的随机值,并将该变量替换为第一个位置。

访问此处:http://code.activestate.com/recipes/360461-fisher-yates-shuffle/

答案 2 :(得分:-1)

“每个文件包含7行文字”

听起来你想做阵列切片。

a = [ 1, 2, 3, 4, 5, 6 ]
a[:3]

将产生1, 2, 3

通过索引起始索引,结束索引和跳过来完成数组切片。在a[:3]中,将跳过起始索引,因此它从0开始到元素3.

a[1:3]会产生[2, 3]

a[1:5:2]将从1开始,结束于5,跳过2.因此它会产生[2, 4]

因此,在您的示例中,您似乎想要编写shuffle[:6]

至于编写文件,你需要某种循环

表示范围(0,10)中的i:        filename =“output-%s.txt”%i

这将产生文件名output-0.txt,output-1.txt等

阅读有关文件输入/输出的https://docs.python.org/2/tutorial/inputoutput.html。基本上,您应该使用with关键字和open

with open(filename, 'w') as f:
    f.write(str(shuffle[:7]))

这应该让你朝着正确的方向前进