如何创建变量并将其应用于列?

时间:2018-06-20 16:09:43

标签: r dplyr bioinformatics

这是一个分为两个部分的问题。我有一个数据集,试图将选择列一起添加,但是我也想更改数据,因此添加起来稍微容易一些。这是我的数据集的一个例子。数据集称为ChrData

ChrData
  Chr location sample1 sample2 sample3 sample4 sample5
1 1 34234 ./. 0/1 1/1 0/1 0/0
2 1 5677876 0/1 1/1 1/2 0/0 1/1
3 1 75424 ./. ./. 1/1 0/1 0/0
4 1 98654 1/1 0/1 1/1 0/0 0/0
5 1 4534 1/1 0/1 ./. 0/0 2/2

所以我要设置

./. = 0 
0/0 = 0
0/1 = 1
1/2 = 1
1/1 = 2
2/2 = 2

然后添加列:

ChrData$sample1 + ChrData$sample2 + ChrData$sample4

还有:

ChrData$sample3 + ChrData$sample5 

,然后使用此数据创建两个新列。我只是不确定如何使R识别新变量,然后将其应用于每个单元格?

2 个答案:

答案 0 :(得分:1)

首先要考虑的基本功能是,假设所有元素都是示例列中的字符,

 replacement<-function(x){
 x=replace(x,which(x=='./.'),0) 
  x=replace(x,which(x=='0/0'),0)
  x=replace(x,which(x=='0/1'), 1)
  x=replace(x,which(x=='1/2'),1)
  x=replace(x,which(x=='1/1'),2)
  x=replace(x,which(x=='2/2'),2)
}

ChrData=apply(ChrData,2,replacement)
ChrData[,3:7]=apply(ChrData,2,as.numeric)

ChrData$Sum1=ChrData$sample1 + ChrData$sample2 + ChrData$sample4
ChrData$Sum2=ChrData$sample3 + ChrData$sample5

答案 1 :(得分:1)

使用 dplyr

# reproducible data
ChrData <- read.table(text = "
Chr location sample1 sample2 sample3 sample4 sample5
1 1 34234 ./. 0/1 1/1 0/1 0/0
2 1 5677876 0/1 1/1 1/2 0/0 1/1
3 1 75424 ./. ./. 1/1 0/1 0/0
4 1 98654 1/1 0/1 1/1 0/0 0/0
5 1 4534 1/1 0/1 ./. 0/0 2/2", stringsAsFactors = FALSE)

library(dplyr)

# make lookup map
MAP <- setNames(c(0,0,1,1,2,2), c("./.","0/0","0/1","1/2","1/1","2/2"))

# convert using MAP, then rowsums per sample groups
ChrData <- ChrData %>% 
  mutate_at(.vars = vars(starts_with("sample")), .funs = funs(MAP[ . ])) %>% 
  mutate(s124 = rowSums(.[ c("sample1","sample2","sample4") ]),
         s35 = rowSums(.[ c("sample3","sample5") ]))

ChrData
#   Chr location sample1 sample2 sample3 sample4 sample5 s124 s35
# 1   1    34234       0       1       2       1       0    2   2
# 2   1  5677876       1       2       1       0       2    3   3
# 3   1    75424       0       0       2       1       0    1   2
# 4   1    98654       2       1       2       0       0    3   2
# 5   1     4534       2       1       0       0       2    3   2