R / dplyr:汇总数据而不进行分组

时间:2020-06-09 15:07:52

标签: r dplyr summarize

我有一个像这样的数据框:

ID V1 V2
A  2  June
B  3  May
A  2  January
F  4  December

我想添加V3,该ID会给我提供每个ID中最早的V2条目的数量:

ID V1 V2        V3
A  2  June      January
B  3  May       May
A  2  January   January
F  4  December  December

我该怎么做?

2 个答案:

答案 0 :(得分:1)

如果您想获得每个V2的最早月份ID,可以将其分组然后再次取消分组(请参见下面的代码中的更多评论):

# load packages
library(tidyverse)
library(lubridate)

# data
data <- read.table(header = TRUE, text = "
    ID V1 V2
    A  2  June
    B  3  May
    A  2  January
    F  4  December
")

# 1. group by ID
# 2. get the earliest month with parsing by 'lubridate' package
# 3. ungroup
# 4. make months to words with 'lubridate' again
data %>%
    group_by(ID) %>%
    mutate(V3 = min(month(parse_date_time(V2, "%m")))) %>%
    ungroup() %>%
    mutate(V3 = month(V3, label = TRUE, abbr = FALSE))

答案 1 :(得分:0)

并非严格dplyr,但是我认为这很容易阅读(至少没有很多嵌套的括号)。另外:我的minmonth函数很方便在其他时间重用,并且很容易将其翻译成非英语输入:

dat <- read.table(text = "ID V1 V2
                           A  2  June
                           B  3  May
                           A  2  January
                           F  4  December", header = TRUE)

minmonth <- function(m){
  months <- c(January = 1, February = 2, March = 3,  # easily translated to 
             April = 4, May = 5, June = 6, July = 7, # other languages
             August = 8, September = 9, October = 10,
             November = 11, December = 12)
  m <- months[m]                                     # no static typing in R
  smallest <- min(m)
  return(names(months)[smallest])
}

dat$V3 <- ave(dat$V2, dat$ID, FUN = minmonth)