我有一个像这样的数据框:
ID V1 V2
A 2 June
B 3 May
A 2 January
F 4 December
我想添加V3
,该ID会给我提供每个ID中最早的V2条目的数量:
ID V1 V2 V3
A 2 June January
B 3 May May
A 2 January January
F 4 December December
我该怎么做?
答案 0 :(得分:1)
如果您想获得每个V2
的最早月份ID
,可以将其分组然后再次取消分组(请参见下面的代码中的更多评论):
# load packages
library(tidyverse)
library(lubridate)
# data
data <- read.table(header = TRUE, text = "
ID V1 V2
A 2 June
B 3 May
A 2 January
F 4 December
")
# 1. group by ID
# 2. get the earliest month with parsing by 'lubridate' package
# 3. ungroup
# 4. make months to words with 'lubridate' again
data %>%
group_by(ID) %>%
mutate(V3 = min(month(parse_date_time(V2, "%m")))) %>%
ungroup() %>%
mutate(V3 = month(V3, label = TRUE, abbr = FALSE))
答案 1 :(得分:0)
并非严格dplyr
,但是我认为这很容易阅读(至少没有很多嵌套的括号)。另外:我的minmonth
函数很方便在其他时间重用,并且很容易将其翻译成非英语输入:
dat <- read.table(text = "ID V1 V2
A 2 June
B 3 May
A 2 January
F 4 December", header = TRUE)
minmonth <- function(m){
months <- c(January = 1, February = 2, March = 3, # easily translated to
April = 4, May = 5, June = 6, July = 7, # other languages
August = 8, September = 9, October = 10,
November = 11, December = 12)
m <- months[m] # no static typing in R
smallest <- min(m)
return(names(months)[smallest])
}
dat$V3 <- ave(dat$V2, dat$ID, FUN = minmonth)