我有一个包含15个数据帧的列表,每13列(时间+ 6个站,每3个层)和172行。我想通过在整个列表上应用一个函数来折叠那些列(站点上的观察)基本上两列(一个用于站点,一个用于观察)。在这里,我使用来自tidyr
的聚集。另外,我想在其中一列中找到一个模式(上层,中层或下层),并在新列中分配一个新值(深度)。为此,我使用ddply
中的plyr
和grep。我的问题是它非常缓慢。我想我用有限的R知识创造了一个瓶颈。那么瓶颈在哪里以及如何改进它?
一个例子:
data <- list(a = data.frame(time = (1:180), alpha.upper = sample(1:180),
beta.middle = sample(1:180), gamma.lower = sample(1:180)),
b = data.frame(time(1:180), alpha.upper = sample(1:180),
beta.middle = sample(1:180), gamma.lower = sample(1:180)))
> data
$a
time alpha.upper beta.middle gamma.lower
1 1 133 179 99
2 2 175 147 56
3 3 169 9 24
4 4 116 129 75
5 5 92 65 65
6 6 141 73 49
$b
time alpha.upper beta.middle gamma.lower
1 1 111 2 89
2 2 84 81 159
3 3 93 82 84
4 4 44 58 125
5 5 31 33 131
6 6 1 120 63
我的代码是:
> data2<-lapply(data, function(x) {
x<-gather(x,stn,value,-time)
x<-arrange(x,time)
x<-ddply(x,c("time","stn","value"), function(x) {
if (grepl(".upper",x$stn) == TRUE)
{
x$depth<-1
return(x)
}
if (grepl(".lower",x$stn) == TRUE)
{
x$depth<-3
return(x)
}
if (grepl(".middle",x$stn) == TRUE)
{
x$depth<-2
return(x)
}
})
return(x)
})
结果应该是:
> data2
$a
time stn value depth
1 1 alpha.upper 111 1
2 1 beta.middle 2 2
3 1 gamma.lower 89 3
4 2 alpha.upper 84 1
5 2 beta.middle 81 2
6 2 gamma.lower 159 3
$b
1 1 alpha.upper 38 1
2 1 beta.middle 151 2
3 1 gamma.lower 93 3
4 2 alpha.upper 61 1
5 2 beta.middle 56 2
6 2 gamma.lower 66 3
答案 0 :(得分:0)
首先让我们重现你的数据。
dataa <- read.table(text =
"time alpha.upper beta.middle gamma.lower
1 133 179 99
2 175 147 56
3 169 9 24
4 116 129 75
5 92 65 65
6 141 73 49", header = T, sep = " ")
datab <- read.table(text =
"time alpha.upper beta.middle gamma.lower
1 1 111 2 89
2 2 84 81 159
3 3 93 82 84
4 4 44 58 125
5 5 31 33 131
6 6 1 120 63", header = T, sep = " ")
mydata <- list(a = dataa, b = datab)
# $a
# time alpha.upper beta.middle gamma.lower
# 1 1 133 179 99
# 2 2 175 147 56
# 3 3 169 9 24
# 4 4 116 129 75
# 5 5 92 65 65
# 6 6 141 73 49
# $b
# time alpha.upper beta.middle gamma.lower
# 1 1 111 2 89
# 2 2 84 81 159
# 3 3 93 82 84
# 4 4 44 58 125
# 5 5 31 33 131
# 6 6 1 120 63
这里我将变量命名为mydata
,因为标准包data
中有一个函数utils
,最好不要将此名称用于变量。
据我所知,你需要将列表的每个data.frame
从“宽”形式变为“长”形式。你可以使用来自gather
软件包的tidyr
,在我看来这是一个聪明的选择,但在这种情况下,我展示了如何使用基本的R工具获得相同的结果。
rebuilddf <- function(df)
{ # first of all see the difference between rep(1:3, each = 3) and rep(1:3, times = 3)
res_df <- data.frame(
time = rep(df$time, each = 3),# first column of new data.frame -
# we repeat each time mark 3 times
# as we know that there are exactly 3
# observations for single time: upper, middle, lower
stn = rep(colnames(df)[-1], times = nrow(df)), # second column
# fill it with words "alpha.upper",
# "beta.middle", "gamma.lower" which are colnames(df)[-1]
# repeated nrow(df) times
value = as.vector(t(as.matrix(df[,-1]))) ) #
# numbers of 2:4 columns of our data.frame are
# transposed and then arranged in a vector
# the result is like reading it row by row
# to understand what's happening with the matrix you can try this code
# m <- matrix(1:20, nrow = 4)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 5 9 13 17
# [2,] 2 6 10 14 18
# [3,] 3 7 11 15 19
# [4,] 4 8 12 16 20
# as.vector(t(m))
# 1 5 9 13 17 2 6 10 14 18 3 7 11 15 19 4 8 12 16 20
# after that we add column "depth"
# as I got it, we need 1 for "upper", 2 for "middle" and 3 for "lower"
# we make it with the help of two nested ifelse functions
res_df <- transform(res_df, depth = ifelse(stn == "alpha.upper", 1,
ifelse(stn == "beta.middle", 2, 3)) )
return(res_df)
}
如果列的名称并不总是相同,并且只有名称的末尾是不变的,我们可以修改depth
的条件,如下所示:
res_df <-
transform(res_df,
depth = ifelse(rev(strsplit(stn, "[.]")[[1]])[1] == "upper",
1,
ifelse(rev(strsplit(stn, "[.]")[[1]])[1] == "middle", 2, 3)
) )
# we work with
# rev(strsplit(stn, "[.]")[[1]])[1]
# it may be "upper", "middle" or "lower"
# here we split character string of form "some.name1.upper" or
# "some.other.colname.lower" by every dot in text, then take
# the first from end part of the string (rev does reversing order)
您也可以修改条件并使用grepl
,但我相信strsplit
会更快。
当我们完成rebuilddf
功能后,让我们看一下它的功能。
lapply(mydata, rebuilddf)
# $a
# time stn value depth
# 1 1 alpha.upper 133 1
# 2 1 beta.middle 179 2
# 3 1 gamma.lower 99 3
# 4 2 alpha.upper 175 1
# 5 2 beta.middle 147 2
# 6 2 gamma.lower 56 3
# 7 3 alpha.upper 169 1
# 8 3 beta.middle 9 2
# 9 3 gamma.lower 24 3
# 10 4 alpha.upper 116 1
# 11 4 beta.middle 129 2
# 12 4 gamma.lower 75 3
# 13 5 alpha.upper 92 1
# 14 5 beta.middle 65 2
# 15 5 gamma.lower 65 3
# 16 6 alpha.upper 141 1
# 17 6 beta.middle 73 2
# 18 6 gamma.lower 49 3
#
# $b
# time stn value depth
# 1 1 alpha.upper 111 1
# 2 1 beta.middle 2 2
# 3 1 gamma.lower 89 3
# 4 2 alpha.upper 84 1
# 5 2 beta.middle 81 2
# 6 2 gamma.lower 159 3
# 7 3 alpha.upper 93 1
# 8 3 beta.middle 82 2
# 9 3 gamma.lower 84 3
# 10 4 alpha.upper 44 1
# 11 4 beta.middle 58 2
# 12 4 gamma.lower 125 3
# 13 5 alpha.upper 31 1
# 14 5 beta.middle 33 2
# 15 5 gamma.lower 131 3
# 16 6 alpha.upper 1 1
# 17 6 beta.middle 120 2
# 18 6 gamma.lower 63 3
我想相信这是您想要的输出,但是在a
b
数据框架中向我们展示的问题{{1}},反之亦然。