将数据框连接到嵌套的小对象

时间:2019-05-28 18:31:37

标签: r

我有一个嵌套的小标题:

> df1
# A tibble: 2 x 2
  period          data            
  <chr>           <list>          
1 Jy_2014_Je_2015 <tibble [7 x 7]>
2 Jy_2013_Je_2014 <tibble [3 x 7]>

其中一个巢看起来像:

> unnest(df1)
# A tibble: 10 x 8
   period          ID       CompYear  filing_date  mgnt  risk start_date end_date  
   <chr>           <chr>    <chr>     <date>      <dbl> <dbl> <date>     <date>    
 1 Jy_2014_Je_2015 71327810 2013_2014 2014-08-20  0.871 0.749 2015-06-01 2016-05-30
 2 Jy_2014_Je_2015 56357140 2014_2015 2015-02-24  0.915 0.958 2015-06-01 2016-05-30
 3 Jy_2014_Je_2015 71340910 2014_2015 2015-02-10  0.787 0.934 2015-06-01 2016-05-30
 4 Jy_2014_Je_2015 09367110 2013_2014 2014-08-26  0.852 0.750 2015-06-01 2016-05-30
 5 Jy_2014_Je_2015 G5785G10 2014_2015 2015-02-26  0.966 0.991 2015-06-01 2016-05-30
 6 Jy_2014_Je_2015 20825150 2014_2015 2015-03-02  0.966 0.973 2015-06-01 2016-05-30
 7 Jy_2014_Je_2015 55616P10 2014_2015 2015-02-23  0.991 0.920 2015-06-01 2016-05-30
 8 Jy_2013_Je_2014 08499Z00 2013_2014 2014-03-19  0.936 0.282 2014-06-01 2015-05-30
 9 Jy_2013_Je_2014 59268810 2013_2014 2014-02-21  0.952 0.911 2014-06-01 2015-05-30
10 Jy_2013_Je_2014 01858110 2013_2014 2014-01-31  0.953 0.966 2014-06-01 2015-05-30

df2如下:

> df2
           ID       date   Var1   date_ret
  1: 01858110 2012-01-31 110.80 2012-01-31
  2: 01858110 2012-02-29 121.36 2012-02-29
  3: 01858110 2012-03-30 125.96 2012-03-30
  4: 01858110 2012-04-30 128.49 2012-04-30
  5: 01858110 2012-05-31 126.00 2012-05-31
 ---                                      
231: G5785G10 2014-08-29  81.49 2014-08-29
232: G5785G10 2014-09-30  90.15 2014-09-30
233: G5785G10 2014-10-31  92.18 2014-10-31
234: G5785G10 2014-11-28  92.22 2014-11-28
235: G5785G10 2014-12-31  99.03 2014-12-31

我正在尝试将df2df1一起加入。 df2是普通数据帧,而df1是嵌套的小对象。它们都包含我希望加入的ID变量,并且date-df2包含每月日期,而df1包含每年dates。因此,我创建了start_dateend_date来开始和结束加入日期。即我想通过IDdf1中的df2变量以及df2中的每月数据来加入start_date和{之间end_date中的{1}}。

据我了解的是,嵌套所需的数据并将其单独加入。

df1

数据:

unnested_data <- unnest(df1)

myJoinedData <- setDT(df2)[, date_ret:=date][data.table(unnested_data), 
                                                      .(ID, date, Var1, 
                                                        period, filing_date, mgnt, start_date, end_date), 
                                                      on = .(ID, date > start_date, date <= end_date)]

要加入的数据

    df1 <- structure(list(period = c("Jy_2014_Je_2015", "Jy_2013_Je_2014"
), data = list(structure(list(ID = c("71327810", "56357140", 
"71340910", "09367110", "G5785G10", "20825150", "55616P10"), 
    CompYear = c("2013_2014", "2014_2015", "2014_2015", "2013_2014", 
    "2014_2015", "2014_2015", "2014_2015"), filing_date = structure(c(16302, 
    16490, 16476, 16308, 16492, 16496, 16489), class = "Date"), 
    mgnt = c(0.871267898855628, 0.915166869000075, 0.786638982683625, 
    0.852085258472343, 0.965682470009356, 0.965813590885971, 
    0.990809585984218), risk = c(0.748733465269009, 0.958314403117101, 
    0.934083811166365, 0.749665010947671, 0.990592523426367, 
    0.973022180801192, 0.920155039512913), start_date = structure(c(16587, 
    16587, 16587, 16587, 16587, 16587, 16587), class = "Date"), 
    end_date = structure(c(16951, 16951, 16951, 16951, 16951, 
    16951, 16951), class = "Date")), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(ID = c("08499Z00", "59268810", 
"01858110"), CompYear = c("2013_2014", "2013_2014", "2013_2014"
), filing_date = structure(c(16148, 16122, 16101), class = "Date"), 
    mgnt = c(0.936233012426363, 0.951806853733034, 0.953036137199852
    ), risk = c(0.281925228286195, 0.911441002298349, 0.966068420793039
    ), start_date = structure(c(16222, 16222, 16222), class = "Date"), 
    end_date = structure(c(16585, 16585, 16585), class = "Date")), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame")))), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

1 个答案:

答案 0 :(得分:1)

我们可以用list遍历map列并进行联接

library(tidyverse)
library(data.table)
df1 %>% 
  mutate(data = map(data, ~ 
       as.data.table(df2)[, date_ret := date][data.table(.x),
     .(ID, date, Var1, filing_date, mgnt, start_date, end_date),
       on = .(ID, date > start_date, date <= end_date)]))
# A tibble: 2 x 2
#  period          data             
#  <chr>           <list>           
#1 Jy_2014_Je_2015 <df[,7] [7 × 7]> 
#2 Jy_2013_Je_2014 <df[,7] [15 × 7]>