我有一个宽格式的数据框。
df <- data.frame(
time = as.Date('2009-01-01') + 0:5,
D.13.JA = rnorm(6, 0, 1),
D.40.JA = rnorm(6, 0, 1),
D.90.JA = rnorm(6, 0, 1),
A.13.JA = rnorm(6, 0, 1),
R.13.JA = rnorm(6, 0, 1)
)
time D.13.JA D.40.JA D.90.JA A.13.JA R.13.JA
1 2009-01-01 -2.2529442 0.1341954 0.3024757 -0.465533145 -0.49755117
2 2009-01-02 1.0698570 -1.3597724 0.6607091 0.001913148 0.92522135
3 2009-01-03 1.7558374 -1.0280084 -0.1446586 -0.355776775 0.12556738
4 2009-01-04 -0.2571767 -0.9065826 0.9340532 -0.150408270 -0.57386938
5 2009-01-05 0.2389923 -1.2818616 0.5643812 -1.272623868 -0.05700965
6 2009-01-06 1.6444592 -1.5610767 -1.4377561 -0.701273356 0.29777858
我希望将数据框转换为以下格式:
time DirDegree Type Wh
1 2009-01-01 D.13 JA -2.2529442
2 2009-01-02 D.13 JA 1.0698570
3 2009-01-03 D.13 JA 1.7558374
4 2009-01-04 D.13 JA -0.2571767
5 2009-01-05 D.13 JA 0.2389923
6 2009-01-06 D.13 JA 1.6444592
到目前为止,我已经成功地将其转换为整齐的格式
df.tidy = df %>%
gather(key, Wh, -time) %>%
separate(key, c("Dir", "Degree", "Type"), "\\.")
time Dir Degree Type Wh
1 2009-01-01 D 13 JA -1.18105757
2 2009-01-02 D 13 JA 1.34437449
3 2009-01-03 D 13 JA -0.08451173
4 2009-01-04 D 13 JA -1.88959285
5 2009-01-05 D 13 JA 1.25388470
6 2009-01-06 D 13 JA -1.24286611
我尝试根据this answer
对其进行格式化test1 = df %>%
gather(key, value, -time) %>%
extract(key, c("DirDeg", "Type"), "(..\\..)\\.(.)")
test2 = df %>%
gather(key, value, -time) %>%
extract(key, c("DirDeg", "Type"), "(\\.)\\.()")
两者都给我
time DirDeg Type value
1 2009-01-01 <NA> <NA> -1.18105757
2 2009-01-02 <NA> <NA> 1.34437449
3 2009-01-03 <NA> <NA> -0.08451173
4 2009-01-04 <NA> <NA> -1.88959285
5 2009-01-05 <NA> <NA> 1.25388470
6 2009-01-06 <NA> <NA> -1.24286611
7 2009-01-01 <NA> <NA> -0.55782526
答案 0 :(得分:1)
要做:
df.tidy = df %>%
gather(key, Wh, -time) %>%
extract(key, c("DirDeg", "Type"), "(.*)\\.(\\w+)$")
这将提取所有内容,直到.
为止,并提取\\w+
结尾的任何字母数字。
结果:
time DirDeg Type Wh
1 2009-01-01 D.13 JA -2.2529442
2 2009-01-02 D.13 JA 1.0698570
3 2009-01-03 D.13 JA 1.7558374
4 2009-01-04 D.13 JA -0.2571767
5 2009-01-05 D.13 JA 0.2389923
6 2009-01-06 D.13 JA 1.6444592
7 2009-01-01 D.40 JA 0.1341954
8 2009-01-02 D.40 JA -1.3597724
9 2009-01-03 D.40 JA -1.0280084
10 2009-01-04 D.40 JA -0.9065826
11 2009-01-05 D.40 JA -1.2818616
12 2009-01-06 D.40 JA -1.5610767
13 2009-01-01 D.90 JA 0.3024757
14 2009-01-02 D.90 JA 0.6607091
15 2009-01-03 D.90 JA -0.1446586
16 2009-01-04 D.90 JA 0.9340532
17 2009-01-05 D.90 JA 0.5643812
18 2009-01-06 D.90 JA -1.4377561
19 2009-01-01 A.13 JA -0.465533145
20 2009-01-02 A.13 JA 0.001913148
21 2009-01-03 A.13 JA -0.355776775
22 2009-01-04 A.13 JA -0.150408270
23 2009-01-05 A.13 JA -1.272623868
24 2009-01-06 A.13 JA -0.701273356
25 2009-01-01 R.13 JA -0.49755117
26 2009-01-02 R.13 JA 0.92522135
27 2009-01-03 R.13 JA 0.12556738
28 2009-01-04 R.13 JA -0.57386938
29 2009-01-05 R.13 JA -0.05700965
30 2009-01-06 R.13 JA 0.29777858
答案 1 :(得分:1)
我们也可以使用separate
。显示的.
有两个匹配项-1).
后跟一个数字,2).
后跟大写字母。如果我们提供正则表达式环顾四周,以将.
匹配到大写字符之前,即第二个匹配,它将以这种方式拆分
library(tidyverse)
df %>%
gather(key, Wh, -time) %>%
separate(key, into = c("DirDeg", "Type"), sep = "\\.(?=[A-Z])") %>%
as_tibble
# A tibble: 30 x 4
# time DirDeg Type Wh
# <date> <chr> <chr> <dbl>
# 1 2009-01-01 D.13 JA -0.546
# 2 2009-01-02 D.13 JA 0.537
# 3 2009-01-03 D.13 JA 0.420
# 4 2009-01-04 D.13 JA -0.584
# 5 2009-01-05 D.13 JA 0.847
# 6 2009-01-06 D.13 JA 0.266
# 7 2009-01-01 D.40 JA 0.445
# 8 2009-01-02 D.40 JA -0.466
# 9 2009-01-03 D.40 JA -0.848
#10 2009-01-04 D.40 JA 0.00231
# … with 20 more rows