说我有以下数据集:
mydf <- data.frame("MemberID"=c("111","0111A","0111B","112","0112A"),
"resign.date"=c("2013/01/01","2013/01/01","2013/01/01","2014/03/01","2014/03/01"))
注意:111,112和113是家庭代表的ID。
我想做两件事:
a)如果我有一个家庭代表的辞职日期,例如在111的情况下,我想粘贴相同的辞职日期为0111A和0111B(这些代表配偶和孩子的111如果你想知道)
b)如果我没有家人代表的辞职日期,例如113,我只想删除行113和0113B。
我的结果数据框应如下所示:
{{1}}
提前致谢。
答案 0 :(得分:1)
如果[{1}}仅适用于(某些)resign.date
而没有尾随字母,则使用MembersID
data.table
修改强>
如果library(data.table)
df <- data.table( "MemberID"=c("0111","0111A","0111B","0112","0112A","0113","0113B"),
"resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA))
df <- df[order(MemberID)] ## order data : MemberIDs w/out trailing letters first by ID
df[, myID := gsub("\\D+", "", MemberID)] ## create myID col : MemberID w/out trailing letters
df[ , my.resign.date := resign.date[1L], by = myID] ##assign first occurrence of resign date by myID
df <- df[!is.na(my.resign.date)] ##drop rows if my.resign.date is missing
中有不一致(有些人有前导0,有些则没有),你可以尝试一些解决方法,如下所示
MemberID
答案 1 :(得分:1)
我们也可以使用tidyverse
library(tidyverse)
mydf %>%
group_by(grp = parse_number(MemberID)) %>%
mutate(resign.date = first(resign.date)) %>%
na.omit() %>%
ungroup() %>%
select(-grp)
# A tibble: 5 x 2
# MemberID resign.date
# <fctr> <fctr>
#1 0111 2013/01/01
#2 0111A 2013/01/01
#3 0111B 2013/01/01
#4 0112 2014/03/01
#5 0112A 2014/03/01