我有一个包含许多变量的大型数据集,并希望通过因子对所有变量进行一些计算,并将结果返回到一个漂亮的数据框中。所以,我的数据可能如下所示:
数据示例:
df <- data.frame(
hour = factor(rep(1:24, each = 100)),
price = runif(20)*100,
cons = sample(1:100,2400, replace = T),
wind = sample(1:100,2400, replace = T),
solar = sample(1:100,2400, replace = T)
)
我想对每个变量进行一些简单的计算 - 通过因子 - 使用如下函数:
fx <- function(x) {
n <- length(x)
mean <- mean(x)
median <- median(x)
std <- sd(x)
var <- var(x)
max <- max(x)
min <- min(x)
#results <-list(n, mean, median, std, var, max, min)
#return(results)
}
将它们放在像这样的数据框架中会很棒:
datasummary:
hour(factor) length(price) mean(price) ... min(price) length(cons) ... etc
1
2
3
..
24
现在这个工作正常,如果我为每个变量手动执行,但我想必须有一个更简单的方法来使用plyr或apply技巧。但是我无法弄清楚如何从单个变量转到整个数据帧,也不知道如何将它变回数据帧。
答案 0 :(得分:2)
使用R基函数aggregate
set.seed(1) # your data, set.seed(1) is for reproducibility
df <- data.frame(
hour = factor(rep(1:24, each = 100)),
price = runif(20)*100,
cons = sample(1:100,2400, replace = T),
wind = sample(1:100,2400, replace = T),
solar = sample(1:100,2400, replace = T)
)
# a slightly modified version of your function
fx <- function(x) {
c(n=length(x), mean=mean(x), median=quantile(x, .5),
std=sd(x), var=var(x), max=max(x), min=min(x))
}
# applying your function and getting results
> agresult <- aggregate(.~hour, FUN=fx, data=df)
> agresult <- do.call(data.frame, agresult)
> agresult[1:6,1:8]
hour price.n price.mean price.median.50. price.std price.var price.max price.min
1 1 100 55.51671 60.09837 28.02782 785.5584 99.19061 6.178627
2 2 100 55.51671 60.09837 28.02782 785.5584 99.19061 6.178627
3 3 100 55.51671 60.09837 28.02782 785.5584 99.19061 6.178627
4 4 100 55.51671 60.09837 28.02782 785.5584 99.19061 6.178627
5 5 100 55.51671 60.09837 28.02782 785.5584 99.19061 6.178627
6 6 100 55.51671 60.09837 28.02782 785.5584 99.19061 6.178627
答案 1 :(得分:1)
不确定。它被称为numcolwise
的{{1}}参数......
ddply
或使用require( plyr)
ddply( df , .(hour) , numcolwise( mean ) )
# hour price cons wind solar
#1 1 58.0735 55.21 47.42 48.10
#2 2 58.0735 53.50 47.36 48.91
#3 3 58.0735 52.10 50.13 48.56
#4 4 58.0735 49.78 46.17 53.33
#5 5 58.0735 49.46 50.40 49.29
#6 6 58.0735 49.59 55.66 50.27
...
reshape2::dcast