创建数据框变量,其观察结果取决于匹配条件

时间:2018-01-18 22:51:18

标签: r

我有两个独立的数据框:Data1和Data2。

head(Data1)
  BeginDate    Value      EndDate
 04/01/2002   350000   06/15/2012
 09/01/2001   220000   02/07/2016
 11/01/2016   473000   01/01/2017

head(Data2)
      Date   HPI
01/01/1998   156
02/01/1998   158
03/01/1998   161
         .     .
         .     .
01/01/2017   209   

我想最终得到以下结论:

head(Data1)
  BeginDate    Value      EndDate   BeginHPI   EndHPI
 02/01/1998   350000   06/15/2012        158      191
 09/01/2001   220000   02/07/2016        173      199
 11/01/2016   473000   01/01/2017        202      209

开始HPI和结束HPI是返回的HPI值:

Data1$BeginDate==Data2$Date 

Data2$EndDate==Data2$Date

尊敬。 我见过类似的请求,但我区分了我的问题,因为我不想使用ifelse语句,或者其他需要我写出日期的内容,因为有太多可能的日期。我应该注意到这是一个简化的例子b / c我的真实'Data1'包括可能400,000个观察和30个变量。 'Data2'是该数据集的真实格式。我基本上试图将宏观经济时间序列附加到大型面板数据集中。

1 个答案:

答案 0 :(得分:1)

也许这种方法可能会有所帮助:

我稍微更改了数据,因此日期匹配:

z1
   BeginDate  Value    EndDate
1 04/01/2002 350000 06/15/2012
2 09/01/2001 220000 02/07/2016
3 11/01/2016 473000 01/01/2017

z2
        Date HPI
1 04/01/2002 156
2 02/07/2016 158
3 11/01/2016 161

library(tidyverse)
z1 %>%
  left_join(z2 %>%
              rename(BeginDate = Date)) %>% #merge by begin date after renaming Date to BeginDate in second data frame
  rename(BeginHPI = HPI) %>% #rename HPI to BeginHPI
  left_join(z2 %>%
              rename(EndDate = Date)) %>% #another merge but now for EndDate
  rename(EndHPI = HPI)
#output
   BeginDate  Value    EndDate BeginHPI EndHPI
1 04/01/2002 350000 06/15/2012      156     NA
2 09/01/2001 220000 02/07/2016       NA    158
3 11/01/2016 473000 01/01/2017      161     NA

使用的数据:

> dput(z1)
structure(list(BeginDate = structure(1:3, .Label = c("04/01/2002", 
"09/01/2001", "11/01/2016"), class = "factor"), Value = c(350000L, 
220000L, 473000L), EndDate = structure(c(3L, 2L, 1L), .Label = c("01/01/2017", 
"02/07/2016", "06/15/2012"), class = "factor")), .Names = c("BeginDate", 
"Value", "EndDate"), class = "data.frame", row.names = c(NA, 
-3L))

> dput(z2)
structure(list(Date = structure(c(2L, 1L, 3L), .Label = c("02/07/2016", 
"04/01/2002", "11/01/2016"), class = "factor"), HPI = c(156L, 
158L, 161L)), .Names = c("Date", "HPI"), class = "data.frame", row.names = c(NA, 
-3L))