Question

我有一个.txt文件，看起来像这样：

rs1 NC AB NC     
rs2 AB NC AA  
rs3 NC NC NC  
...

对于每一行，我想计算“NC”的频率，以便我的输出如下所示：

rs1 2  
rs2 1  
rs3 3  
...

有人可以告诉我如何在R或Linux中执行此操作吗？非常感谢！

Answer 1

df$count <- rowSums(df[-1] == "NC")
#    V1 V2 V3 V4 count
# 1 rs1 NC AB NC     2
# 2 rs2 AB NC AA     1
# 3 rs3 NC NC NC     3

我们可以对从此表达式rowSums创建的矩阵使用df[-1] == "NC"。

Answer 2

dat <- read.table(text="rs1 NC AB NC rs2 AB NC AA rs3 NC NC NC")
dat <- rbind(dat, dat, dat, dat)

您可以使用行table来获取每行的频率在这种情况下，对于第1行到第4行，与我复制数据相等的频率

freq <- apply(dat, 1, table)
    1 2 3 4 # row-number
AA  1 1 1 1
AB  2 2 2 2
NC  6 6 6 6
rs1 1 1 1 1
rs2 1 1 1 1
rs3 1 1 1 1

如果您希望在所有行上聚合频率，请使用

rowSums(freq)
AA  AB  NC rs1 rs2 rs3 
 4   8  24   4   4   4