我有两个数据框:
的 DF
set.seed(10)
df <- data.frame(Name = c("Bob","John","Jane","John","Bob","Jane","Jane"),
Date=as.Date(c("2014-06-04", "2013-12-04", "2013-11-04" , "2013-12-06" ,
"2014-01-09", "2014-03-21", "2014-09-24")), Degrees= rnorm(7, mean=32, sd=32))
Name | Date | Degrees
Bob | 2014-06-04 | 50.599877
John | 2013-12-04 | 44.103919
Jane | 2013-11-04 | 6.117422
John | 2013-12-06 | 30.826633
Bob | 2014-01-09 | 59.425444
Jane | 2014-03-21 | 62.473418
Jane | 2014-09-24 | 11.341562
DF2
df2 <- data.frame(Name = c("Bob","John","Jane"),
Date=as.Date(c("2014-03-01", "2014-01-20", "2014-06-07")),
Weather = c("Good weather","Bad weather", "Good weather"))
Name | Date | Weather
Bob | 2014-03-01 | Good weather
John | 2014-01-20 | Bad weather
Jane | 2014-06-07 | Good weather
我想提取以下内容:
Name | Date | Weather | Degrees (until this Date) | Other measures
Bob | 2014-03-01 | Good weather | 59.425444 | 50.599877
John | 2014-01-20 | Bad weather | 44.103919, 30.826633 |
Jane | 2014-06-07 | Good weather | 6.117422, 62.473418 | 11.341562
这是df和df2之间的合并,包括:
答案 0 :(得分:1)
这是一种方法:
library(dplyr)
library(tidyr)
library(magrittr)
res <-
left_join(df, df2 %>% select(Name, Date, Weather), by = "Name") %>%
mutate(paste = factor(Date.x <= Date.y, labels = c("before", "other"))) %>%
group_by(Name, paste) %>%
mutate(Degrees = paste(Degrees, collapse = ", ")) %>%
distinct() %>%
spread(paste, Degrees) %>%
group_by(Name, Date.y, Weather) %>%
summarise(other = other[1], before = before[2]) %>%
set_names(c("Name", "Date" , "Weather", "Degrees (until this Date)" , "Other measures"))
res[is.na(res)] <- ""
res
# Name Date Weather Degrees (until this Date) Other measures
# 1 Bob 2014-03-01 Good weather 41.4254440501603 32.5998774701384
# 2 Jane 2014-06-07 Good weather -11.8825775975204, 44.4734176224054 -6.65843761374357
# 3 John 2014-01-20 Bad weather 26.10391865379, 12.826633094921
可能还有改进的余地,但无论如何。
答案 1 :(得分:1)
另一种选择:
#a grouping variable to use for identical splitting
nms = unique(c(as.character(df$Name), as.character(df2$Name)))
#split data
dates = split(df$Date, factor(df$Name, nms))
degrees = split(df$Degrees, factor(df$Name, nms))
thresholds = split(df2$Date, factor(df2$Name, nms))
#mapply the condition
res = do.call(rbind.data.frame,
Map(function(date, thres, deg)
tapply(deg, factor(date <= thres, c(TRUE, FALSE)),
paste0, collapse = ", "),
dates, thresholds, degrees))
#bind with df2
cbind(df2, setNames(res[match(row.names(res), df2$Name), ], c("Degrees", "Other")))
# Name Date Weather Degrees Other
#Bob Bob 2014-03-01 Good weather 41.4254440501603 32.5998774701384
#John John 2014-01-20 Bad weather 26.10391865379, 12.826633094921 <NA>
#Jane Jane 2014-06-07 Good weather -11.8825775975204, 44.4734176224054 -6.65843761374357