如何使用其他两个数据帧中的信息构建数据帧

时间:2020-11-12 17:37:37

标签: r dataframe merge

我有两个数据帧,第一个数据帧大约有700,000行,第二个数据帧有3000行。

第一个看起来像这样:

Date    time       20  30  10        
201512         1 -142 406 406 
201512         2 -554 376 841 
201512         3 -378 652 692
201512         4 -841 117 707 
201512         5 -187 159 338 
201512         6 -637 364 362 

第二个:

t X20.202009   X30.202009  X10.202009 X20.201512   X30.201512
1 0.001234288 -0.008524849 0.04354567 0.001095770 -0.003047575
2 0.001579689 -0.008521823 0.04357056 0.001297871 -0.003047184
3 0.001925088 -0.007832549 0.04349185 0.001499973 -0.003046793
4 0.002270496 -0.006745983 0.04372217 0.001702075 -0.002563976
5 0.002615893 -0.005659420 0.04362848 0.001904166 -0.001801842

我尝试合并两个数据帧,但是由于两个矩阵的长度而无法合并。 我想根据时间,日期和(A,B或C)在第二个数据帧上添加三列带有相应编号的三列。

1 个答案:

答案 0 :(得分:1)

这是您的数据框:

df1 <- structure(list(Date = c(201512L, 201512L, 201512L, 201512L, 201512L, 
201512L), time = 1:6, `20` = c(-142L, -554L, -378L, -841L, -187L, 
-637L), `30` = c(406L, 376L, 652L, 117L, 159L, 364L), `10` = c(406L, 
841L, 692L, 707L, 338L, 362L)), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(t = 1:5, X20.202009 = c(0.001234288, 0.001579689, 
0.001925088, 0.002270496, 0.002615893), X30.202009 = c(-0.008524849, 
-0.008521823, -0.007832549, -0.006745983, -0.00565942), X10.202009 = c(0.04354567, 
0.04357056, 0.04349185, 0.04372217, 0.04362848), X20.201512 = c(0.00109577, 
0.001297871, 0.001499973, 0.001702075, 0.001904166), X30.201512 = c(-0.003047575, 
-0.003047184, -0.003046793, -0.002563976, -0.001801842)), row.names = c(NA, 
-5L), class = "data.frame")

第1步。重新格式化df2,以使t列(重命名为时间),日期和值分布在重组后的数据列中:

library(dplyr)
library(tidyr)

df2_reformat <- df2 %>% 
    rename(time = t) %>% 
    pivot_longer(contains("."), names_to = c("group", "Date"), names_sep = "\\.") %>% 
    pivot_wider(names_from = group, values_from = value) %>% 
    mutate(Date = as.integer(Date))

head(df2_reformat)
## A tibble: 6 x 5
#   time   Date     X20      X30     X10
#  <int>  <int>   <dbl>    <dbl>   <dbl>
#1     1 202009 0.00123 -0.00852  0.0435
#2     1 201512 0.00110 -0.00305 NA     
#3     2 202009 0.00158 -0.00852  0.0436
#4     2 201512 0.00130 -0.00305 NA     
#5     3 202009 0.00193 -0.00783  0.0435
#6     3 201512 0.00150 -0.00305 NA    

第2步。通过向df1添加新列来合并df1df2_reformat

df_final <- left_join(df1, df2_reformat)
#Joining, by = c("Date", "time")

df_final
#    Date time   20  30  10         X20          X30 X10
#1 201512    1 -142 406 406 0.001095770 -0.003047575  NA
#2 201512    2 -554 376 841 0.001297871 -0.003047184  NA
#3 201512    3 -378 652 692 0.001499973 -0.003046793  NA
#4 201512    4 -841 117 707 0.001702075 -0.002563976  NA
#5 201512    5 -187 159 338 0.001904166 -0.001801842  NA
#6 201512    6 -637 364 362          NA           NA  NA