我有一个data.frame如下:
>data
ID Orginal Modified
Sam_1 M K
Sam_1 K M
Sam_1 I J
Sam_1 M K
Sam_1 K M
Sam_2 K M
Sam_2 M K
Sam_3 J P
Sam_4 K M
Sam_4 M K
Sam_4 P J
我想计算每一个样本数乘以M列和#34;原始"在#34; Modified"列中转换为K.和" K"列#34;原文"到" M"在列中修改并在制表符分隔文本文件中报告,如下所示:
>newdata
ID M_to_K_counts K_to_M_counts
Sam_1 2 2
Sam_2 1 1
Sam_3 0 0
Sam_4 1 1
我尝试了以下代码,但失败了:
counts=function()
{
for(i in 1:dim(rnaseqmut)[1])
{
mk_counts=0
km_counts=0
if(data$Original[i]=='M' & data$Modified[i]== 'K')
{
mk_counts=mk_counts+1
}
if(data$Original[i]=='K' & data$Modified[i]== 'M')
{
km_counts=km_counts+1
}
}
print(mk_counts)
print(km_counts)
}
如何达到我想要的格式。
答案 0 :(得分:5)
一种选择是使用data.table
。将'data.frame'转换为'data.table'(setDT(data)
)。按“ID”列分组,我们得到'{1}}元素为'原始'为'M','K'为'Modified'('MtoKcount'),类似'KtoMcount'得到反过来。
sum
另一个选项是来自library(data.table)
setDT(data)[, list(MtoKcount=sum(Orginal=='M' & Modified=='K'),
KtoMcount = sum(Orginal=='K' & Modified=='M')), by = ID]
# ID MtoKcount KtoMcount
#1: Sam_1 2 2
#2: Sam_2 1 1
#3: Sam_3 0 0
#4: Sam_4 1 1
的{{1}}。我们table
除“ID”列(base R
)以外的列,并使用paste
获取频次数。然后,我们将只有'KM'或'MK'作为列名的表输出('tbl')进行子集化
do.call(paste0, data[-1])
在评论中提到的@ user295691,我们可以在table
时更改列名。
tbl <- table(data$ID,do.call(paste0, data[-1]))[,c('KM', 'MK')]
tbl
# KM MK
#Sam_1 2 2
#Sam_2 1 1
#Sam_3 0 0
#Sam_4 1 1
paste
答案 1 :(得分:3)
使用xtabs
进行基础R.期望的形状/子集需要转置和摆弄容器类型类。
d<-as.matrix(ftable(xtabs(Count~Orginal+Modified+ID,transform(data,Count=1))))
as.data.frame(t(d))[,c("M_K","K_M")]
M_K K_M Sam_1 2 2 Sam_2 1 1 Sam_3 0 0 Sam_4 1 1
答案 2 :(得分:1)
使用dplyr
x <- data.frame(ID = c(rep("Sam_1", 5), rep("Sam_2", 2), "Sam_3", rep("Sam_4", 3)),
Orginal = c("M", "K", "I", "M", "K", "K", "M", "J", "K", "M", "P"),
Modified = c("K", "M", "J", "K", "M", "M", "K", "P", "M", "K", "J"))
x %>%
group_by(ID) %>%
summarise(M_to_K_counts = length((Orginal == "M")[Modified == "K"]),
K_to_M_counts = length((Orginal == "K")[Modified == "M"]))
# Source: local data frame [4 x 3]
# ID M_to_K_counts K_to_M_counts
# 1 Sam_1 2 2
# 2 Sam_2 1 1
# 3 Sam_3 0 0
# 4 Sam_4 1 1