Question

我有以下数据，

   Sample_ID   SNP_Name Genotype Phenotype CV.Group
1     AUS002  rs1028005       AA         1        4
2     AUS002  rs4788050       TC         1        4
3     AUS002 rs17143930       CC         1        4
4     AUS002  rs3920214       AA         1        4
5     AUS002  rs1862520       GG         1        4
6     AUS002  rs1461224       AC         1        4

我用以下命令重塑了它：

reshaped.data <- reshape(merged.data, timevar = "SNP_Name", idvar = c("Sample_ID","Phenotype","CV.Group"), direction = "wide")

根据Sample_ID向我提供我想要分组的内容，并且每个变量仅提供三个类别（基因型数据），它可以正常工作。

      Sample_ID Phenotype CV.Group Genotype.rs1028005 Genotype.rs4788050
1        AUS002         1        4                 AA                 TC
4039     AUS003         1        3                 GG               <NA>
7927     AUS004         1        4                 AA                 TC
11965    AUS005         0        2                 AG                 TT
16003    AUS007         0        2                 AA                 TC

然而，当我尝试将其中一个变量列表时，它显示其他级别，当它应该只有三个时（例如AA，AG和GG）。哪里出错？

table(reshaped.data$Phenotype,reshaped.data$Genotype.rs1028005)

  -- AA AC AG AT CC CG GC GG TA TC TG TT
0  0 45  0 35  0  0  0  0  4  0  0  0  0
1  0 16  0 12  0  0  0  0  3  0  0  0  0

Answer 1

我认为这是重塑数据集后未使用级别的情况。要删除'factor'变量中的levels，我们可以再次调用factor或使用函数droplevels删除这些未使用的级别。

table(droplevels(reshaped.data$Phenotype),
                 droplevels(reshaped.data$Genotype.rs1028005))

或者只是

 reshaped.data <- droplevels(reshaped.data)
 table(reshaped.data[,c('Phenotype', 'Genotype.rs1028005')])

使用reshape命令后错误的因子级别

1 个答案: