我有一个嵌套的小标题:
> df1
# A tibble: 2 x 2
period data
<chr> <list>
1 Jy_2014_Je_2015 <tibble [7 x 7]>
2 Jy_2013_Je_2014 <tibble [3 x 7]>
其中一个巢看起来像:
> unnest(df1)
# A tibble: 10 x 8
period ID CompYear filing_date mgnt risk start_date end_date
<chr> <chr> <chr> <date> <dbl> <dbl> <date> <date>
1 Jy_2014_Je_2015 71327810 2013_2014 2014-08-20 0.871 0.749 2015-06-01 2016-05-30
2 Jy_2014_Je_2015 56357140 2014_2015 2015-02-24 0.915 0.958 2015-06-01 2016-05-30
3 Jy_2014_Je_2015 71340910 2014_2015 2015-02-10 0.787 0.934 2015-06-01 2016-05-30
4 Jy_2014_Je_2015 09367110 2013_2014 2014-08-26 0.852 0.750 2015-06-01 2016-05-30
5 Jy_2014_Je_2015 G5785G10 2014_2015 2015-02-26 0.966 0.991 2015-06-01 2016-05-30
6 Jy_2014_Je_2015 20825150 2014_2015 2015-03-02 0.966 0.973 2015-06-01 2016-05-30
7 Jy_2014_Je_2015 55616P10 2014_2015 2015-02-23 0.991 0.920 2015-06-01 2016-05-30
8 Jy_2013_Je_2014 08499Z00 2013_2014 2014-03-19 0.936 0.282 2014-06-01 2015-05-30
9 Jy_2013_Je_2014 59268810 2013_2014 2014-02-21 0.952 0.911 2014-06-01 2015-05-30
10 Jy_2013_Je_2014 01858110 2013_2014 2014-01-31 0.953 0.966 2014-06-01 2015-05-30
df2
如下:
> df2
ID date Var1 date_ret
1: 01858110 2012-01-31 110.80 2012-01-31
2: 01858110 2012-02-29 121.36 2012-02-29
3: 01858110 2012-03-30 125.96 2012-03-30
4: 01858110 2012-04-30 128.49 2012-04-30
5: 01858110 2012-05-31 126.00 2012-05-31
---
231: G5785G10 2014-08-29 81.49 2014-08-29
232: G5785G10 2014-09-30 90.15 2014-09-30
233: G5785G10 2014-10-31 92.18 2014-10-31
234: G5785G10 2014-11-28 92.22 2014-11-28
235: G5785G10 2014-12-31 99.03 2014-12-31
我正在尝试将df2
和df1
一起加入。 df2
是普通数据帧,而df1
是嵌套的小对象。它们都包含我希望加入的ID
变量,并且date
-df2
包含每月日期,而df1
包含每年dates
。因此,我创建了start_date
和end_date
来开始和结束加入日期。即我想通过ID
和df1
中的df2
变量以及df2
中的每月数据来加入start_date
和{之间end_date
中的{1}}。
据我了解的是,嵌套所需的数据并将其单独加入。
df1
数据:
unnested_data <- unnest(df1)
myJoinedData <- setDT(df2)[, date_ret:=date][data.table(unnested_data),
.(ID, date, Var1,
period, filing_date, mgnt, start_date, end_date),
on = .(ID, date > start_date, date <= end_date)]
要加入的数据
df1 <- structure(list(period = c("Jy_2014_Je_2015", "Jy_2013_Je_2014"
), data = list(structure(list(ID = c("71327810", "56357140",
"71340910", "09367110", "G5785G10", "20825150", "55616P10"),
CompYear = c("2013_2014", "2014_2015", "2014_2015", "2013_2014",
"2014_2015", "2014_2015", "2014_2015"), filing_date = structure(c(16302,
16490, 16476, 16308, 16492, 16496, 16489), class = "Date"),
mgnt = c(0.871267898855628, 0.915166869000075, 0.786638982683625,
0.852085258472343, 0.965682470009356, 0.965813590885971,
0.990809585984218), risk = c(0.748733465269009, 0.958314403117101,
0.934083811166365, 0.749665010947671, 0.990592523426367,
0.973022180801192, 0.920155039512913), start_date = structure(c(16587,
16587, 16587, 16587, 16587, 16587, 16587), class = "Date"),
end_date = structure(c(16951, 16951, 16951, 16951, 16951,
16951, 16951), class = "Date")), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(ID = c("08499Z00", "59268810",
"01858110"), CompYear = c("2013_2014", "2013_2014", "2013_2014"
), filing_date = structure(c(16148, 16122, 16101), class = "Date"),
mgnt = c(0.936233012426363, 0.951806853733034, 0.953036137199852
), risk = c(0.281925228286195, 0.911441002298349, 0.966068420793039
), start_date = structure(c(16222, 16222, 16222), class = "Date"),
end_date = structure(c(16585, 16585, 16585), class = "Date")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame")))), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:1)
我们可以用list
遍历map
列并进行联接
library(tidyverse)
library(data.table)
df1 %>%
mutate(data = map(data, ~
as.data.table(df2)[, date_ret := date][data.table(.x),
.(ID, date, Var1, filing_date, mgnt, start_date, end_date),
on = .(ID, date > start_date, date <= end_date)]))
# A tibble: 2 x 2
# period data
# <chr> <list>
#1 Jy_2014_Je_2015 <df[,7] [7 × 7]>
#2 Jy_2013_Je_2014 <df[,7] [15 × 7]>