max([column])其中name =(名称列中的每个唯一名称),表示R中的每年

时间:2017-10-20 23:03:00

标签: r

我正在使用R中的婴儿姓名数据进行练习。

total_n <-babynames %>% 
    mutate(name_gender = paste(name,sex))%>%
    group_by(year) %>%
    summarise(total_n = sum(n, na.rm=TRUE)) %>%
    arrange(total_n)

bn <- inner_join(babynames,total_n,by = "year")

df <- bn%>%
    mutate(pct_of_names = n/total_n)%>%
    group_by(name, year)%>%
    summarise(pct =sum(pct_of_names))

数据框输出如下所示:

enter image description here

对于每个名字,所有年份都有,以及该年的相关pct。我坚持为每个名字获得最高pct的年份。我该怎么做呢?

1 个答案:

答案 0 :(得分:2)

非常简单,一旦您知道babynames数据的来源。你有所需的一切:

library(dplyr)
library(babynames)

total_n <-babynames %>% 
    mutate(name_gender = paste(name,sex))%>%
    group_by(year) %>%
    summarise(total_n = sum(n, na.rm=TRUE)) %>%
    arrange(total_n)

bn <- inner_join(babynames,total_n,by = "year")

df <- bn%>%
    mutate(pct_of_names = n/total_n)%>%
    group_by(name, year)%>%
    summarise(pct =sum(pct_of_names))

你错过了最后一步:

df %>%
    group_by(name) %>% 
    filter(pct == max(pct))

# A tibble: 95,025 x 3
# Groups:   name [95,025]
        name  year          pct
       <chr> <dbl>        <dbl>
 1     Aaban  2014 4.338256e-06
 2     Aabha  2014 2.440269e-06
 3     Aabid  2003 1.316094e-06
 4 Aabriella  2015 1.363073e-06
 5      Aada  2015 1.363073e-06
 6     Aadam  2015 5.997520e-06
 7     Aadan  2009 6.031433e-06
 8   Aadarsh  2014 4.880538e-06
 9     Aaden  2009 3.335645e-04
10    Aadesh  2011 1.370356e-06
# ... with 95,015 more row

group_byfilter是您的朋友。