R'adegenet'包:DNAbin2genind没有正确格式化数据

时间:2016-02-22 20:37:33

标签: r

我想使用adegenet包来运行遗传数据分析。为此,我需要将我的fasta文件转换为adegenet识别的genid文件。

我尝试以两种不同的方式输入数据,结果相同。

>mydata.fasta <- fasta2DNAbin("~/Desktop/blattodeatest/Cryptocercuspunctulatus/COII.afa")

> mydata.fasta
22 DNA sequences in binary format stored in a matrix.

All sequences of same length: 404 

Labels: AB425873_COII AB425877_COII AB425878_COII AB425876_COII AB425880_COII AB425884_COII ...

Base composition:
a     c     g     t 
0.404 0.181 0.085 0.329 

>mydata.dna <- read.dna("~/Desktop/blattodeatest/Cryptocercus punctulatus/COII.afa", format="fasta")

> mydata.dna
22 DNA sequences in binary format stored in a matrix.

All sequences of same length: 404 

Labels: AB425873_COII AB425877_COII AB425878_COII AB425876_COII AB425880_COII AB425884_COII ...

Base composition:
a     c     g     t 
0.404 0.181 0.085 0.329 

然后我试图转换数据但得到奇怪的结果。

>mydata.genind <- DNAbin2genind(mydata.fasta)
>mydata.genind
/// GENIND OBJECT /////////

 // 22 individuals; 91 loci; 189 alleles; size: 62.3 Kb

 // Basic content
   @tab:  22 x 189 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-3)
   @loc.fac: locus factor for the 189 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 1-1)
   @type:  codom
   @call: DNAbin2genind(x = mydata.dna)

 // Optional content
   - empty -

我的数据中只有一个404bp基因座,看起来fasta文件正在被正确读取。我无法弄清楚为什么在我使用DNAbin2genind后R认为有91个基因座?

1 个答案:

答案 0 :(得分:0)

原因是genind对象将基因中的每个多态位置视为基因座。获得多态的位置。在原始对齐类型中:

as.vector(mydata.genind@loc.names,mode="numeric")