我想使用R对以下数据集进行一些描述性分析.I 尝试使用reshape包使用融化和dcast功能,但无法使用 产生预期的结果。
Age MARK_Science MARK_Maths MARK_IT
30 98 78 NA
40 99 NA 91
26 NA 98 72
NA 76 99 98
29 88 NA 69
26 NA NA 56
我希望我的输出格式如下:
Age MARK_Science MARK_Maths MARK_IT
Mean
Median
Total Observation
Missing Observation
% Missing Observation
SD
我将如何制作这样的产品? 能否请您输入所需格式的代码作为输出? 有什么建议吗?
答案 0 :(得分:0)
您无需融化或重塑,只需对每列应用一些汇总统计信息并汇总结果
# example data
set.seed(256)
df <- data.frame(age= rnorm(100, mean= 30, sd= 7),
sci= runif(100, 0, 100),
math= runif(100, 0, 100),
it= runif(100, 0, 100))
# insert missing values
s <- replicate(4, sample(1:100, rpois(1, lambda = 10), replace=F))
for (i in 1:4) {
df[[i]][s[[i]]] <- NA
}
# tabulate
t <- rbind(apply(df, 2, mean, na.rm=T),
apply(df, 2, median, na.rm=T),
apply(df, 2, length),
apply(df, 2, function(j) sum(is.na(j))),
apply(df, 2, function(j) sum(is.na(j))) / nrow(df) * 100)
rownames(t) <- c("mean", "median", "n", "n_miss", "pct_miss")
R> t
age sci math it
mean 30.41222 52.16733 46.58483 49.99577
median 30.84088 51.76666 47.91840 47.42555
n 100.00000 100.00000 100.00000 100.00000
n_miss 18.00000 9.00000 9.00000 17.00000
pct_miss 18.00000 9.00000 9.00000 17.00000
答案 1 :(得分:0)
以下是sapply()
my.agg <- function(x)
c(Mean=mean(x, na.rm=TRUE), Median=median(x, na.rm=TRUE),
Total.n=length(x), Pct.na=100*sum(is.na(x))/length(x), Sd=sd(x, na.rm=TRUE))
sapply(df, FUN=my.agg)