问题:
我有以下数据框,我想简化
Fruit <- c("Apple","Apple","Orange","Orange","Banana","Banana")
Farmer <- c("Bob","Ben","Bill","Bob","George","Bob")
Tons.Jan <- c(20,40,10,20,35,15)
Tons.Feb <- c(30,40,20,15,25,30)
Tons.Mar <- c(10,10,15,10,20,30)
Tons.Apr <- c(15,20,15,30,30,30)
Tons.May <- c(20,5,20,20,20,10)
df <- cbind(Fruit,Farmer)
df <- cbind(df,Tons.Jan)
df <- cbind(df,Tons.Feb)
df <- cbind(df,Tons.Mar)
df <- cbind(df,Tons.Apr)
df <- tbl_df(cbind(df,Tons.May))
我希望能够将Farmers总结为一个逗号分隔的强大,并将Tons与观察结果相加,使其看起来如下所示
我想了解以下
Fruit2 <- c("Apple","Orange","Banana")
Farmer2 <- c("Bob,Ben","Bill,Bob","George,Bob")
Tons.Jan2 <- c(60,30,50)
Tons.Feb2 <- c(70,35,55)
Tons.Mar2 <- c(20,25,50)
Tons.Apr2 <- c(35,45,60)
Tons.May2 <- c(25,40,30)
df2 <- cbind(Fruit2,Farmer2)
df2 <- cbind(df2,Tons.Jan2)
df2 <- cbind(df2,Tons.Feb2)
df2 <- cbind(df2,Tons.Mar2)
df2 <- cbind(df2,Tons.Apr2)
df2 <- tbl_df(cbind(df2,Tons.May2))
我尝试了什么:
我尝试过使用dplyr function group_by和summarise_each
df <- df %>% group_by(Fruit) %>%
summarise_each_(funs(toString))
但是我不确定如何整合数值并不使用总结函数专门调出每一列,
感谢任何帮助。
答案 0 :(得分:2)
library(dplyr)
# Convert the relevant columns to numeric
df <- mutate_each(df, funs(as.numeric), -Fruit, -Farmer)
# or as mentioned in the comments by jazzurro
df <- mutate_at(df, vars(starts_with("Tons")), as.numeric)
df %>%
group_by(Fruit) %>%
mutate(Farmer = toString(Farmer)) %>%
group_by(Fruit, Farmer) %>%
summarise_all(funs(sum))
#Source: local data frame [3 x 7]
#Groups: Fruit [?]
#
# Fruit Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Apple Bob, Ben 60 70 20 35 25
#2 Banana George, Bob 50 55 50 60 30
#3 Orange Bill, Bob 30 35 25 45 40
答案 1 :(得分:2)
最好不要data.frame(cbind(
或tbl_df(cbind
cbind
将vector
绑定到matrix
,矩阵只能容纳一个matrix
class,所以我们将data.frame
更改为stringsAsFactors=TRUE
(使用默认选项vector
),如果任何字符matrix
,则{ {1}}将是所有character
类列,并且由于列现在factor
类转换为data.frame
,因此会变得更糟。因此,我们无需执行as.numeric(as.character(
更改type
列的numeric
。最好构建“data.frame”#39;如
data.frame(Fruit, Farmer, Tons.Jan, ...)
data.table
解决方案
library(data.table)
setDT(df)[, Farmer := toString(Farmer), by = Fruit][ ,
lapply(.SD, function(x) sum(as.numeric(as.character(x)))) , .(Fruit, Farmer)]
# Fruit Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May
#1: Apple Bob, Ben, Bob, Ben 60 70 20 35 25
#2: Orange Bill, Bob, Bill, Bob 30 35 25 45 40
#3: Banana George, Bob, George, Bob 50 55 50 60 30
此外,这可以在一个步骤中进行,并通过“水果”进行分组。 (根据OP的输出)
setDT(df)[, c(Farmer = toString(Farmer), lapply(.SD[,
setdiff(names(.SD), "Farmer"), with = FALSE],
function(x) sum(as.numeric(as.character(x))))), .(Fruit)]
# Fruit Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May
#1: Apple Bob, Ben 60 70 20 35 25
#2: Orange Bill, Bob 30 35 25 45 40
#3: Banana George, Bob 50 55 50 60 30