我有两个独立的数据框:Data1和Data2。
head(Data1)
BeginDate Value EndDate
04/01/2002 350000 06/15/2012
09/01/2001 220000 02/07/2016
11/01/2016 473000 01/01/2017
head(Data2)
Date HPI
01/01/1998 156
02/01/1998 158
03/01/1998 161
. .
. .
01/01/2017 209
我想最终得到以下结论:
head(Data1)
BeginDate Value EndDate BeginHPI EndHPI
02/01/1998 350000 06/15/2012 158 191
09/01/2001 220000 02/07/2016 173 199
11/01/2016 473000 01/01/2017 202 209
开始HPI和结束HPI是返回的HPI值:
Data1$BeginDate==Data2$Date
和
Data2$EndDate==Data2$Date
尊敬。 我见过类似的请求,但我区分了我的问题,因为我不想使用ifelse语句,或者其他需要我写出日期的内容,因为有太多可能的日期。我应该注意到这是一个简化的例子b / c我的真实'Data1'包括可能400,000个观察和30个变量。 'Data2'是该数据集的真实格式。我基本上试图将宏观经济时间序列附加到大型面板数据集中。
答案 0 :(得分:1)
也许这种方法可能会有所帮助:
我稍微更改了数据,因此日期匹配:
z1
BeginDate Value EndDate
1 04/01/2002 350000 06/15/2012
2 09/01/2001 220000 02/07/2016
3 11/01/2016 473000 01/01/2017
z2
Date HPI
1 04/01/2002 156
2 02/07/2016 158
3 11/01/2016 161
library(tidyverse)
z1 %>%
left_join(z2 %>%
rename(BeginDate = Date)) %>% #merge by begin date after renaming Date to BeginDate in second data frame
rename(BeginHPI = HPI) %>% #rename HPI to BeginHPI
left_join(z2 %>%
rename(EndDate = Date)) %>% #another merge but now for EndDate
rename(EndHPI = HPI)
#output
BeginDate Value EndDate BeginHPI EndHPI
1 04/01/2002 350000 06/15/2012 156 NA
2 09/01/2001 220000 02/07/2016 NA 158
3 11/01/2016 473000 01/01/2017 161 NA
使用的数据:
> dput(z1)
structure(list(BeginDate = structure(1:3, .Label = c("04/01/2002",
"09/01/2001", "11/01/2016"), class = "factor"), Value = c(350000L,
220000L, 473000L), EndDate = structure(c(3L, 2L, 1L), .Label = c("01/01/2017",
"02/07/2016", "06/15/2012"), class = "factor")), .Names = c("BeginDate",
"Value", "EndDate"), class = "data.frame", row.names = c(NA,
-3L))
> dput(z2)
structure(list(Date = structure(c(2L, 1L, 3L), .Label = c("02/07/2016",
"04/01/2002", "11/01/2016"), class = "factor"), HPI = c(156L,
158L, 161L)), .Names = c("Date", "HPI"), class = "data.frame", row.names = c(NA,
-3L))