我有一个大型数据集,我需要从宽格式转换为长格式。这应该很简单,并且有很多关于如何在这个论坛上做到这一点的例子。但是,在这种情况下,我还需要拆分宽格式中使用的列标题,并以长格式为每个列创建一列。
示例数据集
data <- data.frame("East2010"=1:3, "West2010"=4:6, "East2011"=7:9, "West2011"=5:7)
data
East.2010 West.2010 East.2011 West.2011
1 1 4 7 5
2 2 5 8 6
3 3 6 9 7
我想要的是这样的东西
Site Year Response
East 2010 1
East 2010 2
East 2010 3
West 2010 4
West 2010 5
West 2010 6
East 2011 7
East 2011 8
East 2011 9
West 2011 5
West 2011 6
West 2011 7
我在这个论坛上看了很多例子,这些例子会将数据转换为长格式,而其他人则会在分隔符处进行分割,但我无法将两者合作。
答案 0 :(得分:4)
现在是&#34;现代&#34; :-)接近这个:
library(dplyr)
library(tidyr)
data %>%
gather(var, Response, East2010:West2011) %>% ## Makes wide data long
separate(var, c("Site", "Year"), sep = -5) ## Splits up a column
# Site Year Response
# 1 East 2010 1
# 2 East 2010 2
# 3 East 2010 3
# 4 West 2010 4
# 5 West 2010 5
# 6 West 2010 6
# 7 East 2011 7
# 8 East 2011 8
# 9 East 2011 9
# 10 West 2011 5
# 11 West 2011 6
# 12 West 2011 7
上面的sep = -5
表示从字符串的末尾向后移动五个字符并在那里分开。因此,如果你有&#34; North2010&#34;作为一个可能的名称,这仍然有用。
也就是说,使用像@ David这样的正则表达式更可靠,这在separate
中也是可能的:
data %>%
gather(var, Response, East2010:West2011) %>%
separate(var, c("Site", "Year"),
sep = "(?<=[[:alpha:]])(?=[[:digit:]])",
perl = TRUE)
答案 1 :(得分:3)
或(如果列宽不总是恒定的话)。 在这里,我使用“lookahead”和“lookbehind”来将字符与数字分开。
library(reshape2)
data <- melt(data)
temp <- strsplit(as.character(data$variable), "(?<=[[:alpha:]])(?=[[:digit:]])", perl = TRUE)
transform(data, Site = sapply(temp, "[", 1), Year = sapply(temp, "[", 2))
# variable value Site Year
#1 East2010 1 East 2010
#2 East2010 2 East 2010
#3 East2010 3 East 2010
#4 West2010 4 West 2010
#5 West2010 5 West 2010
#6 West2010 6 West 2010
#7 East2011 7 East 2011
#8 East2011 8 East 2011
#9 East2011 9 East 2011
#10 West2011 5 West 2011
#11 West2011 6 West 2011
#12 West2011 7 West 2011
答案 2 :(得分:2)
这些方面的东西可行:
library("plyr")
library("reshape2")
m.data <- melt(data)
m.data <- mutate(m.data, Site=substr(variable, 1,4),
Year=substr(variable, 5,8))
哪会导致:
> m.data
variable value Site Year
1 East2010 1 East 2010
2 East2010 2 East 2010
3 East2010 3 East 2010
4 West2010 4 West 2010
5 West2010 5 West 2010
6 West2010 6 West 2010
7 East2011 7 East 2011
8 East2011 8 East 2011
9 East2011 9 East 2011
10 West2011 5 West 2011
11 West2011 6 West 2011
12 West2011 7 West 2011