Question

假设我有一个基因型数据集：geno

FID rs1 rs2 rs3
1   1   0   2
2   1   1   1
3   0   1   1
4   0   1   0
5   0   0   2

另一个数据集是：男女同校

rs1 rs2 rs3
0.6 0.2 0.3

执行以下代码：

geno$rs1 <- geno$rs1 * coed$rs1
geno$rs2 <- geno$rs2 * coed$rs2
geno$rs3 <- geno$rs3 * coed$rs3

sum3 <- rowSums(geno[,c(2:4)])
c <- cbind(geno,sum3)

我会得到我想要的输出

FID rs1 rs2 rs3 sum3
1   0.6 0   0.6 1.2
2   0.6 0.2 0.3 1.1
3   0   0.2 0.3 0.5
4   0   0.2 0   0.2
5   0   0   0.6 0.6

但是我有成千上万的SNP，我试图构建下面的循环

snp <- names(geno)[2:4]

geno.new <- numeric(0)

for (i in snp){
geno.new[i] = geno1[i] * coed[i]
}

结果不是我所期望的

$rs1
 [1] 0.6 0.6 0.0 0.0 0.0 

$rs2
 [1] 0.0 0.2 0.2 0.2 0.0 

$rs3
 [1] 0.6 0.3 0.3 0.0 0.6

有人可以帮我改进吗？

由于

Answer 1

我确实找到了解决方案，请参阅以下代码：

## read datasets

geno <- read.table("Genotype.csv",header=T,sep=",")

dim(geno)

coed <- read.table("beta.csv",header=T,sep=",")

## define the snp name
snp <- names(geno)[2:4]

## building for loop

for (i in snp){
geno[i] <- geno[i] * coed[i]
}

## caculate the sums
sum <- rowSums(geno[,c(2:4)])

## combind the results
all <- cbind(geno,sum)

用于计算R中的两个向量的循环

1 个答案: