获得一组SNP的所有可能的基因型组合的方法

时间:2018-10-03 03:33:08

标签: python r combinations permutation

我现在有大约20个SNP,我想为其获取所有可能的基因型组合。例如,让我们从三个SNP及其等位基因开始。

SNP      A1         A2
SNP1      A          T
SNP2      C          G
SNP3      T          A

我想首先生成这三个SNP的所有可能基因型排列/组合的列表,例如:

SNP1 SNP2 SNP3
  AA   CC   TT
  AA   CC   TA
  AA   CC   AA
  AA   CG   TT
  AA   CG   TA
  AA   CG   AA
  AA   GG   TT
  AA   GG   TA
  AA   GG   AA
  ...

依次类推,对于我期望的3 ^ 3 = 27种可能的组合。

从这里开始,我希望将其扩展到我的约20个SNP的全部范围。在Python或什至在R中,这样做的好方法是什么?

2 个答案:

答案 0 :(得分:3)

我们可以使用标准"require": { "php":">=7.1.3", "laravel/framework": "5.6.*", "fideloper/proxy" : "^4.0", "guzzlehttp/guzzle": "~6.0", "maatwebsite/excel": "v2.1.*", "itsgoingd/clockwork": "1.*", "barryvdh/laravel-ide-helper": "^2.2", "anchu/ftp": "dev-master", "pda/pheanstalk": "~3.0", "nesbot/carbon": "1.20", "laravelcollective/html": "~5.0", "pusher/pusher-php-server": "~3.0", "regulus/activity-log": "0.6.*", "laravel/tinker": "^1.0" }, "require-dev": { "phpunit/phpunit": "~7.0", "phpspec/phpspec": "~2.1", "laracasts/generators": "^1.1", "symfony/dom-crawler": "~3.1", "symfony/css-selector": "~3.1", "filp/whoops" : "~2.0" }, 模块中的两个函数来生成组合。我们使用combinations_with_replacement从SNP构建3对。

itertools

输出

from itertools import combinations_with_replacement

def pairs(alleles):
    return [u + v for u, v in combinations_with_replacement(alleles, 2)]

print(pairs('TA'))

然后,我们使用product从SNP列表中构建所有组合。

['TT', 'TA', 'AA']

输出

from itertools import combinations_with_replacement, product

def pairs(alleles):
    return [u + v for u, v in combinations_with_replacement(alleles, 2)]

all_snps = ('AT', 'CG', 'TA')

for t in product(*[pairs(snp) for snp in all_snps]):
    print(t)

答案 1 :(得分:3)

以下是您提供的示例的R解决方案:

# Alleles for each SNP
alleles <- data.frame(
  A1 = c("A", "C", "T"),
  A2 = c("T", "G", "A"),
  row.names = paste0("SNP", 1:3)
)

# Get the three possible genotypes for each SNP (diallelic loci)
genotypes <- apply(alleles, 1, function(x) {
  paste0(x[c(1, 1, 2)], x[c(1, 2, 2)])
})  

# Generate all possible combinations
expand.grid(as.data.frame(genotypes))

输出

   SNP1 SNP2 SNP3
1    AA   CC   TT
2    AT   CC   TT
3    TT   CC   TT
4    AA   CG   TT
5    AT   CG   TT
6    TT   CG   TT
7    AA   GG   TT
8    AT   GG   TT
9    TT   GG   TT
10   AA   CC   TA
11   AT   CC   TA
12   TT   CC   TA
13   AA   CG   TA
14   AT   CG   TA
15   TT   CG   TA
16   AA   GG   TA
17   AT   GG   TA
18   TT   GG   TA
19   AA   CC   AA
20   AT   CC   AA
21   TT   CC   AA
22   AA   CG   AA
23   AT   CG   AA
24   TT   CG   AA
25   AA   GG   AA
26   AT   GG   AA
27   TT   GG   AA