聚合数据框列

时间:2015-08-07 14:26:28

标签: r dataframe

我有一个data.frame如下:

>data
    ID     Orginal   Modified
    Sam_1    M         K
    Sam_1    K         M
    Sam_1    I         J
    Sam_1    M         K
    Sam_1    K         M
    Sam_2    K         M
    Sam_2    M         K
    Sam_3    J         P
    Sam_4    K         M
    Sam_4    M         K
    Sam_4    P         J 

我想计算每一个样本数乘以M列和#34;原始"在#34; Modified"列中转换为K.和" K"列#34;原文"到" M"在列中修改并在制表符分隔文本文件中报告,如下所示:

>newdata
    ID     M_to_K_counts  K_to_M_counts 
    Sam_1     2                2 
    Sam_2     1                1
    Sam_3     0                0
    Sam_4     1                1

我尝试了以下代码,但失败了:

counts=function()
{
for(i in 1:dim(rnaseqmut)[1])
{
  mk_counts=0
  km_counts=0
  if(data$Original[i]=='M' & data$Modified[i]== 'K')
    {
       mk_counts=mk_counts+1
    }
  if(data$Original[i]=='K' & data$Modified[i]== 'M')
    {
       km_counts=km_counts+1
    }
}
print(mk_counts)
print(km_counts)
}

如何达到我想要的格式。

3 个答案:

答案 0 :(得分:5)

一种选择是使用data.table。将'data.frame'转换为'data.table'(setDT(data))。按“ID”列分组,我们得到'{1}}元素为'原始'为'M','K'为'Modified'('MtoKcount'),类似'KtoMcount'得到反过来。

sum

另一个选项是来自library(data.table) setDT(data)[, list(MtoKcount=sum(Orginal=='M' & Modified=='K'), KtoMcount = sum(Orginal=='K' & Modified=='M')), by = ID] # ID MtoKcount KtoMcount #1: Sam_1 2 2 #2: Sam_2 1 1 #3: Sam_3 0 0 #4: Sam_4 1 1 的{​​{1}}。我们table除“ID”列(base R)以外的列,并使用paste获取频次数。然后,我们将只有'KM'或'MK'作为列名的表输出('tbl')进行子集化

do.call(paste0, data[-1])

在评论中提到的@ user295691,我们可以在table时更改列名。

 tbl <- table(data$ID,do.call(paste0, data[-1]))[,c('KM', 'MK')]
 tbl
 #      KM MK
 #Sam_1  2  2
 #Sam_2  1  1
 #Sam_3  0  0
 #Sam_4  1  1

数据

paste

答案 1 :(得分:3)

使用xtabs进行基础R.期望的形状/子集需要转置和摆弄容器类型类。

d<-as.matrix(ftable(xtabs(Count~Orginal+Modified+ID,transform(data,Count=1))))
as.data.frame(t(d))[,c("M_K","K_M")]
      M_K K_M
Sam_1   2   2
Sam_2   1   1
Sam_3   0   0
Sam_4   1   1

答案 2 :(得分:1)

使用dplyr

x <- data.frame(ID = c(rep("Sam_1", 5), rep("Sam_2", 2), "Sam_3", rep("Sam_4", 3)), 
 Orginal = c("M", "K", "I", "M", "K", "K", "M", "J", "K", "M", "P"), 
 Modified = c("K", "M", "J", "K", "M", "M", "K", "P", "M", "K", "J"))

x %>%
   group_by(ID) %>%
   summarise(M_to_K_counts = length((Orginal == "M")[Modified == "K"]), 
             K_to_M_counts = length((Orginal == "K")[Modified == "M"]))

# Source: local data frame [4 x 3]

#      ID M_to_K_counts K_to_M_counts
# 1 Sam_1             2             2
# 2 Sam_2             1             1
# 3 Sam_3             0             0
# 4 Sam_4             1             1