根据特定的行值范围连接R中的行

时间:2015-02-27 13:07:16

标签: r merge dataframe concatenation

我有两个数据框:

DF

set.seed(10)
df <- data.frame(Name = c("Bob","John","Jane","John","Bob","Jane","Jane"), 
Date=as.Date(c("2014-06-04", "2013-12-04", "2013-11-04" , "2013-12-06" ,
"2014-01-09", "2014-03-21", "2014-09-24")), Degrees= rnorm(7, mean=32, sd=32))

Name |  Date       | Degrees
Bob  |  2014-06-04 | 50.599877
John |  2013-12-04 | 44.103919
Jane |  2013-11-04 | 6.117422
John |  2013-12-06 | 30.826633
Bob  |  2014-01-09 | 59.425444
Jane |  2014-03-21 | 62.473418
Jane |  2014-09-24 | 11.341562

DF2

df2 <- data.frame(Name = c("Bob","John","Jane"),
Date=as.Date(c("2014-03-01", "2014-01-20", "2014-06-07")),
Weather = c("Good weather","Bad weather", "Good weather"))

Name |  Date       | Weather
Bob  |  2014-03-01 | Good weather
John |  2014-01-20 | Bad weather
Jane |  2014-06-07 | Good weather

我想提取以下内容:

Name |  Date       | Weather      | Degrees (until this Date) | Other measures
Bob  |  2014-03-01 | Good weather | 59.425444                 | 50.599877
John |  2014-01-20 | Bad weather  | 44.103919, 30.826633      |
Jane |  2014-06-07 | Good weather | 6.117422, 62.473418       | 11.341562

这是df和df2之间的合并,包括:

  • “学位(直到此日期)”从df $ Degrees连接到df2 $ Date的日期;
  • “其他度量”的值是df2 $ Date之后df $ Degrees的任何度量。

2 个答案:

答案 0 :(得分:1)

这是一种方法:

library(dplyr)
library(tidyr)
library(magrittr)
res <- 
  left_join(df, df2 %>% select(Name, Date, Weather), by = "Name") %>%
  mutate(paste = factor(Date.x <= Date.y, labels = c("before", "other"))) %>%
  group_by(Name, paste) %>%
  mutate(Degrees = paste(Degrees, collapse = ", ")) %>%
  distinct() %>%
  spread(paste, Degrees) %>%
  group_by(Name, Date.y, Weather) %>%
  summarise(other = other[1], before = before[2]) %>%
  set_names(c("Name", "Date" , "Weather", "Degrees (until this Date)" , "Other measures"))
res[is.na(res)] <- ""
res
#   Name       Date      Weather           Degrees (until this Date)    Other measures
# 1  Bob 2014-03-01 Good weather                    41.4254440501603  32.5998774701384
# 2 Jane 2014-06-07 Good weather -11.8825775975204, 44.4734176224054 -6.65843761374357
# 3 John 2014-01-20  Bad weather     26.10391865379, 12.826633094921        

可能还有改进的余地,但无论如何。

答案 1 :(得分:1)

另一种选择:

#a grouping variable to use for identical splitting
nms = unique(c(as.character(df$Name), as.character(df2$Name)))

#split data
dates = split(df$Date, factor(df$Name, nms))
degrees = split(df$Degrees, factor(df$Name, nms))
thresholds = split(df2$Date, factor(df2$Name, nms))

#mapply the condition
res = do.call(rbind.data.frame, 
              Map(function(date, thres, deg) 
                      tapply(deg, factor(date <= thres, c(TRUE, FALSE)), 
                             paste0, collapse = ", "), 
                  dates, thresholds, degrees))
#bind with df2
cbind(df2, setNames(res[match(row.names(res), df2$Name), ], c("Degrees", "Other")))
#     Name       Date      Weather                             Degrees             Other
#Bob   Bob 2014-03-01 Good weather                    41.4254440501603  32.5998774701384
#John John 2014-01-20  Bad weather     26.10391865379, 12.826633094921              <NA>
#Jane Jane 2014-06-07 Good weather -11.8825775975204, 44.4734176224054 -6.65843761374357