以下是两个df
示例数据:
df1
ID First.seen Last.seen
A10 2015-09-07 2015-09-16
A11 2015-09-07 2015-09-19
df2
ID First_seen Last_seen
A1 2015-09-07 0
A10 2015-09-07 0
如果df2$Last_seen
在ID
中都很常见,我想填写dfs
。请注意,在实际数据中,我在两个dfs中都有几个ID。我尝试使用for
循环,但我得到了数值:
for (i in 1:nrow(df2)){
if (df2$ID[i] %in% df1$ID) {
df2$Last_seen[i] <- df1$Last.seen[df1$ID == df2$ID[i]]
}else{
df2$Last_seen[i] <- 0
}
}
我发现this回答了使用seq_along
的同一问题,但是当我应用此代码时,我得到df1$Last_seen[i] == 1
的结果:
for (i in seq_along(1:nrow(df2))){
if (df2$ID[i] %in% df1$ID) {
df2$Last_seen[i] <- df1$Last.seen[df1$ID == df2$ID[i]]
}else{
df2$Last_seen[i] <- 0
}
}
有关如何正确使用它的任何建议吗?
答案 0 :(得分:0)
你不需要循环来做到这一点。您需要在ID上加入表。这可以通过dplyr
:
df1 <- read.table(text="ID First.seen Last.seen
A10 2015-09-07 2015-09-16
A11 2015-09-07 2015-09-19",header=TRUE, stringsAsFactors=FALSE)
df2<- read.table(text="ID First_seen Last_seen
A1 2015-09-07 0
A10 2015-09-07 0",header=TRUE, stringsAsFactors=FALSE)
library(dplyr)
left_join(df2,df1)
ID First_seen Last_seen First.seen Last.seen
1 A1 2015-09-07 0 <NA> <NA>
2 A10 2015-09-07 0 2015-09-07 2015-09-16
如果你想要一个三列表:
left_join(df2,df1, by=c("ID" = "ID","First_seen"="First.seen")) %>%
mutate(Last_seen=ifelse(is.na(Last.seen),Last_seen,Last.seen)) %>%
select(-Last.seen)
ID First_seen Last_seen
1 A1 2015-09-07 0
2 A10 2015-09-07 2015-09-16
编辑要更改Last_seen为0的出现次数,您可以添加另一个ifelse
:
left_join(df2,df1, by=c("ID" = "ID","First_seen"="First.seen")) %>%
mutate(Last_seen=ifelse(is.na(Last.seen),Last_seen,Last.seen),
Last_seen=ifelse(Last_seen==0,format(as.Date(First_seen)+16,"%Y-%m-%d"),Last.seen))%>%
select(-Last.seen)
ID First_seen Last_seen
1 A1 2015-09-07 2015-09-23
2 A10 2015-09-07 2015-09-16
<强> EDIT2 强>
left_join(df2,df1, by=c("ID" = "ID","First_seen"="First.seen")) %>%
mutate(Last_seen=ifelse(is.na(Last.seen),Last_seen,Last.seen),
Last_seen=ifelse(Last_seen==0,format(as.Date(First_seen)+16,"%Y-%m-%d",origin = "1900-01-01"),Last.seen))%>%
select(-Last.seen)
ID First_seen Last_seen
1 A1 2015-09-07 2015-09-23
2 A10 2015-09-07 2015-09-16