我在R
中有以下数据框 Names Sum
Devpar - 1 10
Devpar - 2 10
Gadhashisha - 1 15
Gadhashisha - 2 15
Gadhashisha - 3 15
Mau Moti - 1 20
Mau Moti - 2 20
Makda 10
我想删除Names
列中的数字并添加总和。
我想要的数据框是
Names Sum
Devpar 20
Gadhashisha 45
Mau Moti 40
Makda 10
我怎样才能在R?
中完成答案 0 :(得分:3)
一种选择是从第一列中删除后缀部分,然后执行sum
library(tidyverse)
df1 %>%
group_by(Names = str_remove(Names, "\\s+-\\s+\\d+")) %>%
summarise(Sum = sum(Sum))
# A tibble: 4 x 2
# Names Sum
# <chr> <int>
#1 Devpar 20
#2 Gadhashisha 45
#3 Makda 10
#4 Mau Moti 40
df1 <- structure(list(Names = c("Devpar - 1", "Devpar - 2", "Gadhashisha - 1",
"Gadhashisha - 2", "Gadhashisha - 3", "Mau Moti - 1", "Mau Moti - 2",
"Makda"), Sum = c(10L, 10L, 15L, 15L, 15L, 20L, 20L, 10L)), .Names = c("Names",
"Sum"), class = "data.frame", row.names = c(NA, -8L))
答案 1 :(得分:3)
基本R版本可以是,假设df1是数据帧的名称:
df1$NewName <- gsub("(.*)\\s+(-.*)","\\1" ,df1$Names)
aggregate( Sum ~ NewName, data=df1, sum)
# NewName Sum
#1 Devpar 20
#2 Gadhashisha 45
#3 Makda 10
#4 Mau Moti 40
答案 2 :(得分:2)
1)base 仅使用base并假设输入DF
如最后的Note中可重复显示的那样,我们删除后缀,计算总和并删除冗余行。在r-devel(R 3.6)中,我们可以选择用sub(...)
替换第一行代码中的trimws(Names, "right", "[- 0-9]"))
。
DF0 <- transform(DF, Names = sub(" - .*", "", Names))
unique(transform(DF0, Sum = ave(Sum, Names, FUN = sum)))
,并提供:
Names Sum
1 Devpar 20
2 Gadhashisha 45
3 Mau Moti 40
4 Makda 10
上面的代码维护原始行顺序(如问题中请求的输出),但如果需要排序输出,则将上面的最后一行代码替换为:
aggregate(Sum ~ Names, DF0, sum)
1a)使用magittr(1)可以写成如下:
library(magrittr)
DF %>%
transform(Names = sub(" - .*", "", Names),
Sum = ave(Sum, Names, FUN = sum)) %>%
unique
2)sqldf 使用SQL我们可以表达如下。它给出了与#1相同的答案。如果不需要原始订单,则省略order by
子句,或者如果需要排序顺序,则将其替换为order by 1
。
library(sqldf)
sqldf("select rtrim(Names, '- 0123456789') Names, sum(Sum) Sum
from DF
group by 1
order by rowid")
3)data.table 这在data.table中也很容易,并以与问题中相同的顺序返回行:
library(data.table)
DT <- as.data.table(DF)
DT[, list(Sum = sum(Sum)), by = sub(" - .*", "", Names)]
Lines <- "Names, Sum
Devpar - 1, 10
Devpar - 2, 10
Gadhashisha - 1, 15
Gadhashisha - 2, 15
Gadhashisha - 3, 15
Mau Moti - 1, 20
Mau Moti - 2, 20
Makda, 10"
DF <- read.csv(text = Lines)
答案 3 :(得分:1)
您还可以使用以下oneliner与基础R:
aggregate(Sum ~ Names, transform(df1, Names = sub(' -.*','',Names)), sum)
结果:
Names Sum
1 Devpar 20
2 Gadhashisha 45
3 Makda 10
4 Mau Moti 40