用条件汇总数据并创建新行(dplyr)

时间:2017-12-03 14:45:51

标签: r dplyr

我试着用一个例子说明我的问题。

示例数据框:

myData <- data.frame(Country = c("Germany","UK","Mexico","Spain"),
                     MyCount = c(300,800,950,125),
                     Continent = c("Europe","Europe","America","Europe"))  

Country  MyCount Continent
Germany  300     Europe
UK       800     Europe
Mexico   950     America
Spain    125     Europe

预期结果:

Country MyCount Continent
Other   425     Europe
UK      800     Europe

我试过这个。

myData %>%
  filter(Continent == "Europe" & MyCount < 800)%>%
  add_row(Country = "Other", MyCount = sum(MyCount), Continent = "Europe")  

3 个答案:

答案 0 :(得分:1)

@Mandy我并没有明确说明您的用例的具体要求,但这应该根据您的意见而有效。使用来自dplyr的group_bysummarise

myData %>% 
       filter(Continent == 'Europe') %>% 
       mutate(grp = ifelse(MyCount < 800, 'Other', Country)) %>% 
       group_by(grp) %>% 
       summarise(MyCount = sum(MyCount))

# A tibble: 2 × 2
grp MyCount
<chr>   <dbl>
1 Other     425
2    UK     800

答案 1 :(得分:1)

如果我正在分析您的样本,以下将是一种方法。您似乎想要来自欧洲的数据,然后将其汇总到MyCount和其他欧洲国家/地区的800以上的国家/地区。如果是这样,您可以将“其他”的所有级别的欧洲国家替换为MyCount中少于800的那些国家并汇总数据。

filter(myData, Continent == "Europe") %>%
group_by(Country = fct_other(Country, keep = Country[MyCount >= 800])) %>%
summarise(MyCount = sum(MyCount))

#  Country MyCount
#   <fctr>   <dbl>
#1      UK     800
#2   Other     425

答案 2 :(得分:0)

不完全清楚您要查找的内容,但这会为您提供您在问题中发布的结果。

library(dplyr)
myData<-data.frame(Country=c("Germany","UK","Mexico","Spain"),MyCount=c(300,800,950,125),Continent=c("Europe","Europe","America","Europe")) 

myData %>%
    filter(Continent == 'Europe') %>%
    mutate(Country = as.character(Country),
           Country = ifelse(Country %in% c('UK'), Country, 'Other')) %>%
    group_by(Country, Continent) %>%
    summarize(MyCount = sum(MyCount)) %>%
    select(Country, MyCount, Continent)

# A tibble: 2 x 3
# Groups:   Country [2]
   Country MyCount Continent
     <chr>   <dbl>    <fctr>
1   Other     425    Europe
2      UK     800    Europe