按名称折叠数据框,其中对整数列求和并粘贴因子列

时间:2015-06-01 17:35:46

标签: r

我试图通过包含补充整数和因子列的唯一名称来折叠数据框。整数列需要与唯一的“名称”值相加,因子列需要将所有值粘贴在一起,如下所示:

Name        Number         Location
RUDU          6               SiteA
RUDU          4               SiteB
YHBL          50              SiteA
YHBL          60              SiteB

名称折叠为:

Name        Number         Location
RUDU          10              SiteA,SiteB
YHBL          110             SiteA,SiteB

我使用ddplyr玩了一下,它适用于整数部分,但我对如何聚合请求的因子部分感到茫然。

3 个答案:

答案 0 :(得分:6)

这是一种可能的data.table方法

library(data.table)
setDT(df)[, list(Mumber = sum(Number), Location = toString(Location)), by = Name]
#    Name Mumber     Location
# 1: RUDU     10 SiteA, SiteB
# 2: YHBL    110 SiteA, SiteB

正如您提到plyr,这是dplyr可能的解决方案

library(dplyr)
df %>%
  group_by(Name) %>%
  summarise(
            Mumber = sum(Number), 
            Location = toString(Location)
            )

# Source: local data table [2 x 3]
# 
#   Name Mumber     Location
# 1 RUDU     10 SiteA, SiteB
# 2 YHBL    110 SiteA, SiteB

答案 1 :(得分:2)

<强> dplyr

library(dplyr)
d %>% 
    group_by_(~Name) %>% 
    summarize_(Number=~sum(Number), Location=~paste(Location, collapse=','))

基础R

merge(aggregate(Number ~ Name, data=d, FUN=sum), aggregate(Location ~ Name, data=d, FUN=paste, collapse=','))

答案 2 :(得分:1)

另外两种参考方法。

功能:tapply()

data.frame(
  Number = with(df1, tapply(Number, Name, sum)),
  Location = with(df1, tapply(Location, Name, toString))                  
)

#      Number     Location
# RUDU     10 SiteA, SiteB
# YHBL    110 SiteA, SiteB

功能:by()

data.frame(cbind(
  Number = with(df1, by(Number, Name, sum)),
  Location =  with(df1, by(Location, Name, toString))
  )
)

#      Number     Location
# RUDU     10 SiteA, SiteB
# YHBL    110 SiteA, SiteB

数据

# df1 <- read.table(text='Name        Number         Location
# RUDU          6               SiteA
# RUDU          4               SiteB
# YHBL          50              SiteA
# YHBL          60              SiteB', header=T)