在R中,有没有一种方法可以仅提取部分列名来收集数据帧?

时间:2020-09-11 02:24:35

标签: r tidyr data-wrangling

概述

所以,我正在整理数据框。我已经找到解决问题的方法,但是当我处理大型数据集时,效率似乎很低。当前,我的代码收集我的数据框,应用一个单独的函数从指标中分离报价器,然后适当地分散数据。参见下面的示例

数据框

    structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-06", 
"2009-07-07", "2009-07-08"), PRED.Open = c(0.5, 0.5, 0.7, 0.7, 
0.7), PRED.High = c(0.5, 0.6, 0.7, 0.7, 0.7), PRED.Low = c(0.5, 
0.5, 0.5, 0.7, 0.7), PRED.Close = c(0.5, 0.6, 0.5, 0.7, 0.7), 
    PRED.Volume = c(0L, 300L, 200L, 0L, 0L), PRED.Adjusted = c(0.5, 
    0.6, 0.5, 0.7, 0.7), GDM.Open = c(1041.02002, 1085.109985, 
    1052.02002, 1011.429993, 1006.630005), GDM.High = c(1097.790039, 
    1085.109985, 1052.02002, 1029.290039, 1006.630005), GDM.Low = c(1041.02002, 
    1038.540039, 995.450012, 1005.280029, 948.73999), GDM.Close = c(1085.109985, 
    1052.02002, 1011.429993, 1006.630005, 966.22998), GDM.Volume = c(0L, 
    0L, 0L, 0L, 0L), GDM.Adjusted = c(1085.109985, 1052.02002, 
    1011.429993, 1006.630005, 966.22998), NBL.Open = c(29.885, 
    29.325001, 27.370001, 27.485001, 26.815001), NBL.High = c(30.35, 
    29.325001, 27.545, 27.610001, 27.18), NBL.Low = c(29.83, 
    28.07, 26.825001, 26.605, 25.745001)), row.names = c(NA, 
-5L), class = "data.frame")

当前解决方案

df <- df %>%  gather(c(2:ncol(df)), key = "ticker", value = "val")

df <- separate(df, col = "ticker", into = c("ticker", "metric"), sep = "\\.") %>% 
  ungroup() %>% 
  spread(key = "metric", value = "val") %>% 
  arrange(ticker, date)

所需结果

enter image description here

问题

有没有更有效的方法来实现这一目标?

1 个答案:

答案 0 :(得分:1)

如果您使用pivot_longer 1.0.0版中的tidyr,则可以在一行中完成此操作:

tidyr::pivot_longer(df, 
                    cols = -date, 
                    names_to = c('ticker', '.value'), 
                    names_sep = '\\.') %>%
dplyr::arrange(ticker, date)

# A tibble: 15 x 8
#   date       ticker     Open     High      Low   Close Volume Adjusted
#   <chr>      <chr>     <dbl>    <dbl>    <dbl>   <dbl>  <int>    <dbl>
# 1 2009-07-01 GDM    1041.0   1097.8   1041.0   1085.1       0  1085.1 
# 2 2009-07-02 GDM    1085.1   1085.1   1038.5   1052.0       0  1052.0 
# 3 2009-07-06 GDM    1052.0   1052.0    995.45  1011.4       0  1011.4 
# 4 2009-07-07 GDM    1011.4   1029.3   1005.3   1006.6       0  1006.6 
# 5 2009-07-08 GDM    1006.6   1006.6    948.74   966.23      0   966.23
# 6 2009-07-01 NBL      29.885   30.35    29.83    NA        NA    NA   
# 7 2009-07-02 NBL      29.325   29.325   28.07    NA        NA    NA   
# 8 2009-07-06 NBL      27.370   27.545   26.825   NA        NA    NA   
# 9 2009-07-07 NBL      27.485   27.610   26.605   NA        NA    NA   
#10 2009-07-08 NBL      26.815   27.18    25.745   NA        NA    NA   
#11 2009-07-01 PRED      0.5      0.5      0.5      0.5       0     0.5 
#12 2009-07-02 PRED      0.5      0.6      0.5      0.6     300     0.6 
#13 2009-07-06 PRED      0.7      0.7      0.5      0.5     200     0.5 
#14 2009-07-07 PRED      0.7      0.7      0.7      0.7       0     0.7 
#15 2009-07-08 PRED      0.7      0.7      0.7      0.7       0     0.7