在R 3.3.0 Dplyr v 0.5.0中聚合到字符串并汇总与聚合相关的值

时间:2016-08-31 23:24:03

标签: r dplyr

问题:

我有以下数据框,我想简化

Fruit <-  c("Apple","Apple","Orange","Orange","Banana","Banana")
Farmer <- c("Bob","Ben","Bill","Bob","George","Bob")
Tons.Jan <- c(20,40,10,20,35,15)
Tons.Feb <- c(30,40,20,15,25,30)
Tons.Mar <- c(10,10,15,10,20,30)
Tons.Apr <- c(15,20,15,30,30,30)
Tons.May <- c(20,5,20,20,20,10)

df <- cbind(Fruit,Farmer)
df <- cbind(df,Tons.Jan)
df <- cbind(df,Tons.Feb)
df <- cbind(df,Tons.Mar)
df <- cbind(df,Tons.Apr)
df <- tbl_df(cbind(df,Tons.May))

我希望能够将Farmers总结为一个逗号分隔的强大,并将Tons与观察结果相加,使其看起来如下所示

我想了解以下

Fruit2 <- c("Apple","Orange","Banana")
Farmer2 <- c("Bob,Ben","Bill,Bob","George,Bob")
Tons.Jan2 <- c(60,30,50)
Tons.Feb2 <- c(70,35,55)
Tons.Mar2 <- c(20,25,50)
Tons.Apr2 <- c(35,45,60)
Tons.May2 <- c(25,40,30)

df2 <- cbind(Fruit2,Farmer2)
df2 <- cbind(df2,Tons.Jan2)
df2 <- cbind(df2,Tons.Feb2)
df2 <- cbind(df2,Tons.Mar2)
df2 <- cbind(df2,Tons.Apr2)
df2 <- tbl_df(cbind(df2,Tons.May2))

我尝试了什么:

我尝试过使用dplyr function group_by和summarise_each

df <- df %>% group_by(Fruit) %>%
   summarise_each_(funs(toString))

但是我不确定如何整合数值并不使用总结函数专门调出每一列,

感谢任何帮助。

2 个答案:

答案 0 :(得分:2)

library(dplyr)

# Convert the relevant columns to numeric
df <- mutate_each(df, funs(as.numeric), -Fruit, -Farmer)

# or as mentioned in the comments by jazzurro
df <- mutate_at(df, vars(starts_with("Tons")), as.numeric)

df %>% 
    group_by(Fruit) %>% 
    mutate(Farmer = toString(Farmer)) %>%
    group_by(Fruit, Farmer) %>%
    summarise_all(funs(sum))


#Source: local data frame [3 x 7]
#Groups: Fruit [?]
#
#   Fruit      Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May
#   <chr>       <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1  Apple    Bob, Ben       60       70       20       35       25
#2 Banana George, Bob       50       55       50       60       30
#3 Orange   Bill, Bob       30       35       25       45       40

答案 1 :(得分:2)

最好不要data.frame(cbind(tbl_df(cbind cbindvector绑定到matrix,矩阵只能容纳一个matrix class,所以我们将data.frame更改为stringsAsFactors=TRUE(使用默认选项vector),如果任何字符matrix,则{ {1}}将是所有character类列,并且由于列现在factor类转换为data.frame,因此会变得更糟。因此,我们无需执行as.numeric(as.character(更改type列的numeric。最好构建“data.frame”#39;如

data.frame(Fruit, Farmer, Tons.Jan, ...)

data.table解决方案

library(data.table)
setDT(df)[,  Farmer :=  toString(Farmer), by = Fruit][ , 
     lapply(.SD, function(x) sum(as.numeric(as.character(x)))) , .(Fruit, Farmer)]
#    Fruit                   Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May
#1:  Apple       Bob, Ben, Bob, Ben       60       70       20       35       25
#2: Orange     Bill, Bob, Bill, Bob       30       35       25       45       40
#3: Banana George, Bob, George, Bob       50       55       50       60       30

此外,这可以在一个步骤中进行,并通过“水果”进行分组。 (根据OP的输出)

setDT(df)[, c(Farmer = toString(Farmer), lapply(.SD[, 
   setdiff(names(.SD), "Farmer"), with = FALSE], 
       function(x) sum(as.numeric(as.character(x))))), .(Fruit)]
#    Fruit      Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May
#1:  Apple    Bob, Ben       60       70       20       35       25
#2: Orange   Bill, Bob       30       35       25       45       40
#3: Banana George, Bob       50       55       50       60       30