Question

我有一个数据框，其中有几个人作为列，SNP作为行。每个列都有一个等位基因（例如G或A或N（如果没有）。还有每个SNP的主要和（单独的列）次要等位基因的列。我试图将单个等位基因值转换为bialleic值对于基于主要和次要等位基因列的每个值（因此，如果个体的等位基因是主要等位基因，我想用空格分隔符粘贴后面的次要等位基因，反之亦然）。如果缺少值（N）我想用0代替它。这里的想法是为Plink格式化这些数据。

到目前为止，我尝试使用ifelse函数但没有成功。有关如何获得双等值的建议吗？非常感谢你！我已经包含了我所指的格式的组合数据集。

我现在拥有的：

rs#       major minor   ind1    ind2    ind3    ind4
rs123456    A    G      A       A       A        G
rs123457    G    C      C       G       C        G
rs123458    C    G      C       C       G        C
rs123459    T    A      A       T       N        T

我想要什么

rs        major minor   ind1    ind2    ind3    ind4
rs123456    A    G      A G     A G     A G      G A
rs123457    G    C      C G     G C     C G      G C
rs123458    C    A      C A     C A     A C      C A
rs123459    T    A      A T     T A     0 0      T A

谢谢！罗布

Answer 1

这是一种做事方式。逐行浏览数据并找到次要/主要的补充。请注意，您的输入和预期输出不匹配。

xy <- read.table(text = "rs       major minor   ind1    ind2    ind3    ind4
rs123456    A    G      A       A       A        G
rs123457    G    C      C       G       C        G
rs123458    C    G      C       C       G        C
rs123459    T    A      A       T       N        T", header = TRUE)
xy

out <- apply(xy, MARGIN = 1, FUN = function(x) {
  findind <- grepl("^ind", names(x))
  x[x %in% x["major"] & findind] <- paste(x[x %in% x["major"] & findind], x["minor"])
  x[x %in% x["minor"] & findind] <- paste(x[x %in% x["minor"] & findind], x["major"])
  x[x %in% "N"] <- "0 0"
  list(x)
})
out <- sapply(out, "[", 1)
as.data.frame(do.call(rbind, out))

        rs major minor ind1 ind2 ind3 ind4
1 rs123456     A     G  A G  A G  A G  G A
2 rs123457     G     C  C G  G C  C G  G C
3 rs123458     C     G  C G  C G  G C  C G
4 rs123459     T     A  A T  T A  0 0  T A

主要和次要等位基因柱上的双等位基因的双等位基因

1 个答案: