概述
所以,我正在整理数据框。我已经找到解决问题的方法,但是当我处理大型数据集时,效率似乎很低。当前,我的代码收集我的数据框,应用一个单独的函数从指标中分离报价器,然后适当地分散数据。参见下面的示例
数据框
structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-06",
"2009-07-07", "2009-07-08"), PRED.Open = c(0.5, 0.5, 0.7, 0.7,
0.7), PRED.High = c(0.5, 0.6, 0.7, 0.7, 0.7), PRED.Low = c(0.5,
0.5, 0.5, 0.7, 0.7), PRED.Close = c(0.5, 0.6, 0.5, 0.7, 0.7),
PRED.Volume = c(0L, 300L, 200L, 0L, 0L), PRED.Adjusted = c(0.5,
0.6, 0.5, 0.7, 0.7), GDM.Open = c(1041.02002, 1085.109985,
1052.02002, 1011.429993, 1006.630005), GDM.High = c(1097.790039,
1085.109985, 1052.02002, 1029.290039, 1006.630005), GDM.Low = c(1041.02002,
1038.540039, 995.450012, 1005.280029, 948.73999), GDM.Close = c(1085.109985,
1052.02002, 1011.429993, 1006.630005, 966.22998), GDM.Volume = c(0L,
0L, 0L, 0L, 0L), GDM.Adjusted = c(1085.109985, 1052.02002,
1011.429993, 1006.630005, 966.22998), NBL.Open = c(29.885,
29.325001, 27.370001, 27.485001, 26.815001), NBL.High = c(30.35,
29.325001, 27.545, 27.610001, 27.18), NBL.Low = c(29.83,
28.07, 26.825001, 26.605, 25.745001)), row.names = c(NA,
-5L), class = "data.frame")
当前解决方案
df <- df %>% gather(c(2:ncol(df)), key = "ticker", value = "val")
df <- separate(df, col = "ticker", into = c("ticker", "metric"), sep = "\\.") %>%
ungroup() %>%
spread(key = "metric", value = "val") %>%
arrange(ticker, date)
所需结果
问题
有没有更有效的方法来实现这一目标?
答案 0 :(得分:1)
如果您使用pivot_longer
1.0.0版中的tidyr
,则可以在一行中完成此操作:
tidyr::pivot_longer(df,
cols = -date,
names_to = c('ticker', '.value'),
names_sep = '\\.') %>%
dplyr::arrange(ticker, date)
# A tibble: 15 x 8
# date ticker Open High Low Close Volume Adjusted
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
# 1 2009-07-01 GDM 1041.0 1097.8 1041.0 1085.1 0 1085.1
# 2 2009-07-02 GDM 1085.1 1085.1 1038.5 1052.0 0 1052.0
# 3 2009-07-06 GDM 1052.0 1052.0 995.45 1011.4 0 1011.4
# 4 2009-07-07 GDM 1011.4 1029.3 1005.3 1006.6 0 1006.6
# 5 2009-07-08 GDM 1006.6 1006.6 948.74 966.23 0 966.23
# 6 2009-07-01 NBL 29.885 30.35 29.83 NA NA NA
# 7 2009-07-02 NBL 29.325 29.325 28.07 NA NA NA
# 8 2009-07-06 NBL 27.370 27.545 26.825 NA NA NA
# 9 2009-07-07 NBL 27.485 27.610 26.605 NA NA NA
#10 2009-07-08 NBL 26.815 27.18 25.745 NA NA NA
#11 2009-07-01 PRED 0.5 0.5 0.5 0.5 0 0.5
#12 2009-07-02 PRED 0.5 0.6 0.5 0.6 300 0.6
#13 2009-07-06 PRED 0.7 0.7 0.5 0.5 200 0.5
#14 2009-07-07 PRED 0.7 0.7 0.7 0.7 0 0.7
#15 2009-07-08 PRED 0.7 0.7 0.7 0.7 0 0.7