我有一个矩阵(类似于假发文件),如下所示:
Position reference A C G T N sum(total read counts)
68773265 A 1 0 0 0 0 1
68773266 C 0 1 0 1 0 2
68773267 C 0 1 1 2 0 4
实现变异(非参考)等位基因比率,
我想创建一个:每个位置(sum-reference sequence's count)/sum * 100
Position reference frequency(%) sum(total read counts)
68773265 A 0 1
68773266 C 50 2
68773267 C 75 4
请就此问题给我一些建议。在此先感谢!!
答案 0 :(得分:0)
使用列名“nm1”的子集,match
“参考”列和“nm1”获取列索引,cbind
使用1:nrow(df1)
创建行/列指数。获取rowSums
“nm1”列(“Sum1”),使用此列根据帖子中的公式创建“frequencyPercent”。
nm1 <- c('A', 'C', 'G', 'T') # this could include `N` also
indx <- cbind(1:nrow(df1), match(df1$reference, nm1))
Sum1 <- rowSums(df1[nm1])
data.frame(df1[1:2], frequencyPercent=100*(Sum1-df1[nm1][indx])/Sum1,
SumTotalCounts=df1[,ncol(df1)])
或在原始数据集上使用transform
transform(df1, frequencyPercent=100*(Sum1-df1[nm1][indx])/Sum1,
check.names=FALSE)[c(1:2,8:9)]
# Position reference sum(total read counts) frequencyPercent
#1 68773265 A 1 0
#2 68773266 C 2 50
#3 68773267 C 4 75
df1 <- structure(list(Position = 68773265:68773267, reference = c("A",
"C", "C"), A = c(1L, 0L, 0L), C = c(0L, 1L, 1L), G = c(0L, 0L,
1L), T = 0:2, N = c(0L, 0L, 0L), `sum(total read counts)` = c(1L,
2L, 4L)), .Names = c("Position", "reference", "A", "C", "G",
"T", "N", "sum(total read counts)"), class = "data.frame",
row.names = c(NA, -3L))