此问题与现有的question非常相似。
但是我无法将其扩展到多组变量。这是我正在处理的数据集
A tibble: 12 x 9
Month Cabo_BU_PCT Acapulco_BU_PCT Cabo_LOS_AVG Acapulco_LOS_AVG BED_BUGS_Cabo BED_BUGS_Acapulco TOTAL_OCCUPIED_Cabo TOTAL_OCCUPIED_Acapulco
1 0.6470034 0.6260116 5.223000 4.307667 5 3 19216 6498
2 0.6167027 0.6777457 5.893571 4.247500 3 0 17095 6566
3 0.6372108 0.6348126 5.229677 4.327742 5 1 19556 6809
4 0.6357912 0.6548170 5.356667 4.220000 4 6 18883 6797
5 0.6449006 0.6409659 5.344194 4.162903 2 5 19792 6875
6 0.6747811 0.6935453 5.812667 4.362000 4 3 20041 7199
7 0.6697947 0.6932687 5.544516 4.462903 5 6 20556 7436
8 0.6595960 0.6777923 5.260323 4.135806 0 7 20243 7270
9 0.6792256 0.6863198 5.424333 4.133333 5 0 20173 7124
10 0.6976214 0.7370875 5.419677 4.350000 3 3 21410 7906
11 0.6600337 0.6615607 5.450000 4.184333 3 2 19603 6867
12 0.6761812 0.6773261 5.347097 4.318710 2 2 20752 7265
我的目标是将其重新整理为如下所示的长格式,其中列Cabo_BU_PCT Acapulco_BU_PCT
在列名BU_PCT
下转换为长格式,同样的列Cabo_LOS_AVG Acapulco_LOS_AVG
也会被转换在列名LOS_AVG等长格式下。
Month Location BU_PCT LOS_AVG BED_BUGS TOTAL_OCCUPIED
1 Cabo 0.6470034 5.223000 5 19216
1 Acapulco 0.6260116 4.307667 3 6498
2 Cabo 0.6167027 5.893571 3 17095
2 Acapulco 0.6777457 4.247500 0 6566
.
.
.
12 Cabo 0.6761812 5.347097 2 20752
12 Acapulco 0.6773261 4.318710 2 7265
非常感谢重塑此数据框的任何帮助。感谢。
======== dataset ===========
df_wide <- structure(list(Month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
), Cabo_BU_PCT = c(0.647003367003367, 0.616702741702742, 0.637210817855979,
0.635791245791246, 0.644900619094168, 0.674781144781145, 0.669794721407625,
0.65959595959596, 0.679225589225589, 0.69762137504073, 0.66003367003367,
0.676181166503747), Acapulco_BU_PCT = c(0.626011560693642, 0.677745664739884,
0.634812604885325, 0.654816955684008, 0.640965877307477, 0.69354527938343,
0.693268692895767, 0.677792280440052, 0.686319845857418, 0.737087451053515,
0.661560693641619, 0.677326123438374), Cabo_LOS_AVG = c(5.223,
5.89357142857143, 5.22967741935484, 5.35666666666667, 5.3441935483871,
5.81266666666667, 5.54451612903226, 5.26032258064516, 5.42433333333333,
5.41967741935484, 5.45, 5.34709677419355), Acapulco_LOS_AVG = c(4.30766666666667,
4.2475, 4.32774193548387, 4.22, 4.16290322580645, 4.362, 4.46290322580645,
4.1358064516129, 4.13333333333333, 4.35, 4.18433333333333, 4.31870967741935
), BED_BUGS_Cabo = c(5, 3, 5, 4, 2, 4, 5, 0, 5, 3, 3, 2), BED_BUGS_Acapulco = c(3,
0, 1, 6, 5, 3, 6, 7, 0, 3, 2, 2), TOTAL_OCCUPIED_Cabo = c(19216,
17095, 19556, 18883, 19792, 20041, 20556, 20243, 20173, 21410,
19603, 20752), TOTAL_OCCUPIED_Acapulco = c(6498, 6566, 6809,
6797, 6875, 7199, 7436, 7270, 7124, 7906, 6867, 7265)), class = c("tbl_df",
"tbl", "data.frame"), .Names = c("Month", "Cabo_BU_PCT", "Acapulco_BU_PCT",
"Cabo_LOS_AVG", "Acapulco_LOS_AVG", "BED_BUGS_Cabo", "BED_BUGS_Acapulco",
"TOTAL_OCCUPIED_Cabo", "TOTAL_OCCUPIED_Acapulco"), row.names = c(NA,
-12L))
答案 0 :(得分:5)
如果你只有两个地点,你可以把它们放在正则表达式中,考虑到它们可能位于名称的开头或结尾:
{{1}}
答案 1 :(得分:2)
这使用基础R中的reshape
。不使用任何包。 varying=
指定要合并第2列和第3列,第4行和第5列等。新列将被赋予v.names=
中指定的名称,并且位置在times=
中指定。
我们可以从标题中推导出varying=
,v.names=
和times=
参数,但由于它们的不规则性,它涉及到一个混乱的正则表达式,因此将它们写出来更简单(但是,我们展示如何在下面进一步做)。
结果按位置排序,然后按月在位置内排序,但如果需要可以使用。
df_long <- reshape(df_wide, dir = "long",
varying = list(2:3, 4:5, 6:7, 8:9),
v.names = c("BU_OCT", "LOS_AVG", "BED_BUGS", "TOTAL_OCCUPIED"),
times = c("Cabo", "Acupuloc"))[-7]
names(df_long)[2] <- "LOCATION"
或者,如果我们确实想要从varying=
派生v.names=
,times=
和names(df_wide)
,可以这样做names1
{{1}没有位置名称。我们使用这样的事实:位置名称由小写字母组成,除了第一个字母以及每个名称的开头或结尾。
names(df_wide)