我有两个数据帧,第一个数据帧大约有700,000行,第二个数据帧有3000行。
第一个看起来像这样:
Date time 20 30 10
201512 1 -142 406 406
201512 2 -554 376 841
201512 3 -378 652 692
201512 4 -841 117 707
201512 5 -187 159 338
201512 6 -637 364 362
第二个:
t X20.202009 X30.202009 X10.202009 X20.201512 X30.201512
1 0.001234288 -0.008524849 0.04354567 0.001095770 -0.003047575
2 0.001579689 -0.008521823 0.04357056 0.001297871 -0.003047184
3 0.001925088 -0.007832549 0.04349185 0.001499973 -0.003046793
4 0.002270496 -0.006745983 0.04372217 0.001702075 -0.002563976
5 0.002615893 -0.005659420 0.04362848 0.001904166 -0.001801842
我尝试合并两个数据帧,但是由于两个矩阵的长度而无法合并。 我想根据时间,日期和(A,B或C)在第二个数据帧上添加三列带有相应编号的三列。
答案 0 :(得分:1)
这是您的数据框:
df1 <- structure(list(Date = c(201512L, 201512L, 201512L, 201512L, 201512L,
201512L), time = 1:6, `20` = c(-142L, -554L, -378L, -841L, -187L,
-637L), `30` = c(406L, 376L, 652L, 117L, 159L, 364L), `10` = c(406L,
841L, 692L, 707L, 338L, 362L)), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(t = 1:5, X20.202009 = c(0.001234288, 0.001579689,
0.001925088, 0.002270496, 0.002615893), X30.202009 = c(-0.008524849,
-0.008521823, -0.007832549, -0.006745983, -0.00565942), X10.202009 = c(0.04354567,
0.04357056, 0.04349185, 0.04372217, 0.04362848), X20.201512 = c(0.00109577,
0.001297871, 0.001499973, 0.001702075, 0.001904166), X30.201512 = c(-0.003047575,
-0.003047184, -0.003046793, -0.002563976, -0.001801842)), row.names = c(NA,
-5L), class = "data.frame")
第1步。重新格式化df2
,以使t列(重命名为时间),日期和值分布在重组后的数据列中:
library(dplyr)
library(tidyr)
df2_reformat <- df2 %>%
rename(time = t) %>%
pivot_longer(contains("."), names_to = c("group", "Date"), names_sep = "\\.") %>%
pivot_wider(names_from = group, values_from = value) %>%
mutate(Date = as.integer(Date))
head(df2_reformat)
## A tibble: 6 x 5
# time Date X20 X30 X10
# <int> <int> <dbl> <dbl> <dbl>
#1 1 202009 0.00123 -0.00852 0.0435
#2 1 201512 0.00110 -0.00305 NA
#3 2 202009 0.00158 -0.00852 0.0436
#4 2 201512 0.00130 -0.00305 NA
#5 3 202009 0.00193 -0.00783 0.0435
#6 3 201512 0.00150 -0.00305 NA
第2步。通过向df1添加新列来合并df1
和df2_reformat
:
df_final <- left_join(df1, df2_reformat)
#Joining, by = c("Date", "time")
df_final
# Date time 20 30 10 X20 X30 X10
#1 201512 1 -142 406 406 0.001095770 -0.003047575 NA
#2 201512 2 -554 376 841 0.001297871 -0.003047184 NA
#3 201512 3 -378 652 692 0.001499973 -0.003046793 NA
#4 201512 4 -841 117 707 0.001702075 -0.002563976 NA
#5 201512 5 -187 159 338 0.001904166 -0.001801842 NA
#6 201512 6 -637 364 362 NA NA NA