我有以下数据框:
ID<-seq(1:5) #patient ID
snp1<-c("A","T","A","A","T")
snp2<-c("C","C","0","C","C")
snp3<-c("A","G","A","A","G")
snp4<-c("T","0","C","G","T")
snp5<-c("G","G","G","G","A")
dat<-data.frame(ID,snp1,snp2,snp3,snp4,snp5)
print(dat)
给出:
ID snp1 snp2 snp3 snp4 snp5
1 1 A C A T G
2 2 T C G 0 G
3 3 A 0 A C G
4 4 A C A G G
5 5 T C G T A
我正在尝试使用嵌套for循环来计算dat中每列的给定值的出现次数。首先,我创建一个空数据框,其中列是snps1-5,行指示每列可以在dat中使用的可能值:
results<- data.frame(matrix(0,ncol = 5, nrow = 5))
colnames(results)=c("snp1","snp2","snp3","snp4","snp5")
rownames(results)=c("A","T","C","G","0")
为了确保我想要在我的循环中合并的代码,我执行以下操作:
results["A","snp1"]<-nrow(subset(dat,subset= snp1=="A"))
print(results)
对于具有A三次的dat中的snp1正确给出3:
snp1 snp2 snp3 snp4 snp5
A 3 0 0 0 0
T 0 0 0 0 0
C 0 0 0 0 0
G 0 0 0 0 0
0 0 0 0 0 0
然后我使用下面的嵌套for循环对每个列执行相同的操作(首先是循环),但是对于dat中的一个列可以采用的每个可能值重复该过程(第二个用于循环):
for(i in colnames(results)){for(j in c("A","T","C","G","0")){
snp<-as.name(i)
results[j,i]=nrow(subset(dat,subset= snp==j))
results
}}
print(results)
给出一个完全填充0的数据框:
snp1 snp2 snp3 snp4 snp5
A 0 0 0 0 0
T 0 0 0 0 0
C 0 0 0 0 0
G 0 0 0 0 0
0 0 0 0 0 0
我花了几个小时在网上试图确定问题是什么,但我不知道解释。我原本希望这个过程取决于添加到dat的表型列的值,这样我得到了病例和控件的计数,但我无法超越这一点。任何建议将不胜感激。谢谢。
答案 0 :(得分:0)
在这样的循环中工作时,我更喜欢使用索引进行子集化。也就是说,将subset(dat,subset= snp==j)
更改为dat[dat[, i] == j, ]
。我希望这有帮助!
当然,您不必使用循环来解决此类问题。你可以做到,
values <- c("A","T","C","G","0")
apply(dat[, -1], 2, function(x) sapply(values, function(y) length(which(x == y))))
答案 1 :(得分:0)
编写一个为一列做正确事情的函数,例如,
fun = function(x)
table(factor(x, levels = c("A", "C", "G", "T", "0")))
然后将其应用于所有列
apply(dat[,-1], 2, fun)
使用NA
而不是0代表缺失值可能要好得多;在这种情况下调整功能
fun = function(x)
table(factor(x, levels = c("A", "C", "G", "T")), useNA = "always")