我试图通过包含补充整数和因子列的唯一名称来折叠数据框。整数列需要与唯一的“名称”值相加,因子列需要将所有值粘贴在一起,如下所示:
Name Number Location
RUDU 6 SiteA
RUDU 4 SiteB
YHBL 50 SiteA
YHBL 60 SiteB
名称折叠为:
Name Number Location
RUDU 10 SiteA,SiteB
YHBL 110 SiteA,SiteB
我使用ddplyr玩了一下,它适用于整数部分,但我对如何聚合请求的因子部分感到茫然。
答案 0 :(得分:6)
这是一种可能的data.table
方法
library(data.table)
setDT(df)[, list(Mumber = sum(Number), Location = toString(Location)), by = Name]
# Name Mumber Location
# 1: RUDU 10 SiteA, SiteB
# 2: YHBL 110 SiteA, SiteB
正如您提到plyr
,这是dplyr
可能的解决方案
library(dplyr)
df %>%
group_by(Name) %>%
summarise(
Mumber = sum(Number),
Location = toString(Location)
)
# Source: local data table [2 x 3]
#
# Name Mumber Location
# 1 RUDU 10 SiteA, SiteB
# 2 YHBL 110 SiteA, SiteB
答案 1 :(得分:2)
<强> dplyr 强>
library(dplyr)
d %>%
group_by_(~Name) %>%
summarize_(Number=~sum(Number), Location=~paste(Location, collapse=','))
基础R
merge(aggregate(Number ~ Name, data=d, FUN=sum), aggregate(Location ~ Name, data=d, FUN=paste, collapse=','))
答案 2 :(得分:1)
另外两种参考方法。
功能:tapply()
data.frame(
Number = with(df1, tapply(Number, Name, sum)),
Location = with(df1, tapply(Location, Name, toString))
)
# Number Location
# RUDU 10 SiteA, SiteB
# YHBL 110 SiteA, SiteB
功能:by()
data.frame(cbind(
Number = with(df1, by(Number, Name, sum)),
Location = with(df1, by(Location, Name, toString))
)
)
# Number Location
# RUDU 10 SiteA, SiteB
# YHBL 110 SiteA, SiteB
数据强>
# df1 <- read.table(text='Name Number Location
# RUDU 6 SiteA
# RUDU 4 SiteB
# YHBL 50 SiteA
# YHBL 60 SiteB', header=T)