如何grep文本和聚合

时间:2018-06-17 13:37:56

标签: r

我在R

中有以下数据框
   Names            Sum
   Devpar - 1       10
   Devpar - 2       10
   Gadhashisha - 1  15
   Gadhashisha - 2  15
   Gadhashisha - 3  15
   Mau Moti - 1     20
   Mau Moti - 2     20
   Makda            10

我想删除Names列中的数字并添加总和。 我想要的数据框是

   Names            Sum
   Devpar           20
   Gadhashisha      45
   Mau Moti         40
   Makda            10

我怎样才能在R?

中完成

4 个答案:

答案 0 :(得分:3)

一种选择是从第一列中删除后缀部分,然后执行sum

library(tidyverse)
df1 %>%
  group_by(Names = str_remove(Names, "\\s+-\\s+\\d+")) %>%
  summarise(Sum = sum(Sum))
# A tibble: 4 x 2
#  Names         Sum
#  <chr>       <int>
#1 Devpar         20
#2 Gadhashisha    45
#3 Makda          10
#4 Mau Moti       40

数据

df1 <- structure(list(Names = c("Devpar - 1", "Devpar - 2", "Gadhashisha - 1", 
"Gadhashisha - 2", "Gadhashisha - 3", "Mau Moti - 1", "Mau Moti - 2", 
"Makda"), Sum = c(10L, 10L, 15L, 15L, 15L, 20L, 20L, 10L)), .Names = c("Names", 
"Sum"), class = "data.frame", row.names = c(NA, -8L))

答案 1 :(得分:3)

基本R版本可以是,假设df1是数据帧的名称:

df1$NewName <- gsub("(.*)\\s+(-.*)","\\1" ,df1$Names)
aggregate( Sum ~ NewName, data=df1, sum)

#       NewName Sum
#1      Devpar  20
#2 Gadhashisha  45
#3       Makda  10
#4    Mau Moti  40

答案 2 :(得分:2)

1)base 仅使用base并假设输入DF如最后的Note中可重复显示的那样,我们删除后缀,计算总和并删除冗余行。在r-devel(R 3.6)中,我们可以选择用sub(...)替换第一行代码中的trimws(Names, "right", "[- 0-9]"))

DF0 <- transform(DF, Names = sub(" - .*", "", Names))
unique(transform(DF0, Sum = ave(Sum, Names, FUN = sum)))

,并提供:

           Names Sum
1         Devpar  20
2    Gadhashisha  45
3       Mau Moti  40
4          Makda  10

上面的代码维护原始行顺序(如问题中请求的输出),但如果需要排序输出,则将上面的最后一行代码替换为:

aggregate(Sum ~ Names, DF0, sum)

1a)使用magittr(1)可以写成如下:

library(magrittr)

DF %>%
   transform(Names = sub(" - .*", "", Names),
             Sum = ave(Sum, Names, FUN = sum)) %>%
   unique

2)sqldf 使用SQL我们可以表达如下。它给出了与#1相同的答案。如果不需要原始订单,则省略order by子句,或者如果需要排序顺序,则将其替换为order by 1

library(sqldf)

sqldf("select rtrim(Names, '- 0123456789') Names, sum(Sum) Sum 
       from DF 
       group by 1 
       order by rowid")

3)data.table 这在data.table中也很容易,并以与问题中相同的顺序返回行:

library(data.table)

DT <- as.data.table(DF)
DT[, list(Sum = sum(Sum)), by = sub(" - .*", "", Names)]

注意

Lines <- "Names,            Sum
   Devpar - 1,       10
   Devpar - 2,       10
   Gadhashisha - 1,  15
   Gadhashisha - 2,  15
   Gadhashisha - 3,  15
   Mau Moti - 1,     20
   Mau Moti - 2,     20
   Makda,            10"
DF <- read.csv(text = Lines)

答案 3 :(得分:1)

您还可以使用以下oneliner与基础R:

aggregate(Sum ~ Names, transform(df1, Names = sub(' -.*','',Names)), sum)

结果:

        Names Sum
1      Devpar  20
2 Gadhashisha  45
3       Makda  10
4    Mau Moti  40