重塑从广泛到长期的变量组

时间:2017-11-22 01:31:37

标签: r dplyr reshape tidyr tidyverse

此问题与现有的question非常相似。

但是我无法将其扩展到多组变量。这是我正在处理的数据集

A tibble: 12 x 9
   Month Cabo_BU_PCT Acapulco_BU_PCT Cabo_LOS_AVG Acapulco_LOS_AVG BED_BUGS_Cabo BED_BUGS_Acapulco TOTAL_OCCUPIED_Cabo TOTAL_OCCUPIED_Acapulco

       1   0.6470034       0.6260116     5.223000         4.307667             5                 3               19216                    6498
       2   0.6167027       0.6777457     5.893571         4.247500             3                 0               17095                    6566
       3   0.6372108       0.6348126     5.229677         4.327742             5                 1               19556                    6809
       4   0.6357912       0.6548170     5.356667         4.220000             4                 6               18883                    6797
       5   0.6449006       0.6409659     5.344194         4.162903             2                 5               19792                    6875
       6   0.6747811       0.6935453     5.812667         4.362000             4                 3               20041                    7199
       7   0.6697947       0.6932687     5.544516         4.462903             5                 6               20556                    7436
       8   0.6595960       0.6777923     5.260323         4.135806             0                 7               20243                    7270
       9   0.6792256       0.6863198     5.424333         4.133333             5                 0               20173                    7124
      10   0.6976214       0.7370875     5.419677         4.350000             3                 3               21410                    7906
      11   0.6600337       0.6615607     5.450000         4.184333             3                 2               19603                    6867
      12   0.6761812       0.6773261     5.347097         4.318710             2                 2               20752                    7265

我的目标是将其重新整理为如下所示的长格式,其中列Cabo_BU_PCT Acapulco_BU_PCT在列名BU_PCT下转换为长格式,同样的列Cabo_LOS_AVG Acapulco_LOS_AVG也会被转换在列名LOS_AVG等长格式下。

  Month    Location    BU_PCT      LOS_AVG     BED_BUGS       TOTAL_OCCUPIED
  1        Cabo        0.6470034   5.223000    5              19216
  1        Acapulco    0.6260116   4.307667    3              6498
  2        Cabo        0.6167027   5.893571    3              17095
  2        Acapulco    0.6777457   4.247500    0              6566
  .
  .
  .
  12       Cabo        0.6761812   5.347097    2              20752
  12       Acapulco    0.6773261   4.318710    2              7265  

非常感谢重塑此数据框的任何帮助。感谢。

======== dataset ===========

df_wide <- structure(list(Month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
), Cabo_BU_PCT = c(0.647003367003367, 0.616702741702742, 0.637210817855979, 
0.635791245791246, 0.644900619094168, 0.674781144781145, 0.669794721407625, 
0.65959595959596, 0.679225589225589, 0.69762137504073, 0.66003367003367, 
0.676181166503747), Acapulco_BU_PCT = c(0.626011560693642, 0.677745664739884, 
0.634812604885325, 0.654816955684008, 0.640965877307477, 0.69354527938343, 
0.693268692895767, 0.677792280440052, 0.686319845857418, 0.737087451053515, 
0.661560693641619, 0.677326123438374), Cabo_LOS_AVG = c(5.223, 
5.89357142857143, 5.22967741935484, 5.35666666666667, 5.3441935483871, 
5.81266666666667, 5.54451612903226, 5.26032258064516, 5.42433333333333, 
5.41967741935484, 5.45, 5.34709677419355), Acapulco_LOS_AVG = c(4.30766666666667, 
4.2475, 4.32774193548387, 4.22, 4.16290322580645, 4.362, 4.46290322580645, 
4.1358064516129, 4.13333333333333, 4.35, 4.18433333333333, 4.31870967741935
), BED_BUGS_Cabo = c(5, 3, 5, 4, 2, 4, 5, 0, 5, 3, 3, 2), BED_BUGS_Acapulco = c(3, 
0, 1, 6, 5, 3, 6, 7, 0, 3, 2, 2), TOTAL_OCCUPIED_Cabo = c(19216, 
17095, 19556, 18883, 19792, 20041, 20556, 20243, 20173, 21410, 
19603, 20752), TOTAL_OCCUPIED_Acapulco = c(6498, 6566, 6809, 
6797, 6875, 7199, 7436, 7270, 7124, 7906, 6867, 7265)), class = c("tbl_df", 
"tbl", "data.frame"), .Names = c("Month", "Cabo_BU_PCT", "Acapulco_BU_PCT", 
"Cabo_LOS_AVG", "Acapulco_LOS_AVG", "BED_BUGS_Cabo", "BED_BUGS_Acapulco", 
"TOTAL_OCCUPIED_Cabo", "TOTAL_OCCUPIED_Acapulco"), row.names = c(NA, 
-12L))

2 个答案:

答案 0 :(得分:5)

如果你只有两个地点,你可以把它们放在正则表达式中,考虑到它们可能位于名称的开头或结尾:

{{1}}

答案 1 :(得分:2)

这使用基础R中的reshape。不使用任何包。 varying=指定要合并第2列和第3列,第4行和第5列等。新列将被赋予v.names=中指定的名称,并且位置在times=中指定。

我们可以从标题中推导出varying=v.names=times=参数,但由于它们的不规则性,它涉及到一个混乱的正则表达式,因此将它们写出来更简单(但是,我们展示如何在下面进一步做)。

结果按位置排序,然后按月在位置内排序,但如果需要可以使用。

df_long <- reshape(df_wide, dir = "long", 
 varying = list(2:3, 4:5, 6:7, 8:9),
 v.names = c("BU_OCT", "LOS_AVG", "BED_BUGS", "TOTAL_OCCUPIED"),
 times = c("Cabo", "Acupuloc"))[-7]
names(df_long)[2] <- "LOCATION"

或者,如果我们确实想要从varying=派生v.names=times=names(df_wide),可以这样做names1 {{1}没有位置名称。我们使用这样的事实:位置名称由小写字母组成,除了第一个字母以及每个名称的开头或结尾。

names(df_wide)