我正在学习R.我想使用简单易读的R代码为出版物生成摘要统计表。该表应该包含变量行,交替平均值和SD作为列,两个分组变量也作为列。所有值都应舍入为两位数,包括零(必要时添加零)。
以mtcars数据集为例,我希望表格看起来比较4,6和8个汽车,自动或手动):
| |4 0 | |4 1 | |6 0 | |6 1 | |8 0 | |8 1 | |
|:----|:---------|:-------|:---------|:-------|:---------|:-------|:---------|:-------|:---------|:-------|:---------|:-------|
| |mean |(SD) |mean |(SD) |mean |(SD) |mean |(SD) |mean |(SD) |mean |(SD) |
|mpg |22.90 |(1.45) |28.07 |(4.48) |19.12 |(1.63) |20.57 |(0.75) |15.05 |(2.77) |15.40 |(0.57) |
|disp |135.87 |(13.97) |93.61 |(20.48) |204.55 |(44.74) |155.00 |(8.66) |357.62 |(71.82) |326.00 |(35.36) |
|hp |84.67 |(19.66) |81.88 |(22.66) |115.25 |(9.18) |131.67 |(37.53) |194.17 |(33.36) |299.50 |(50.20) |
我编写了以下代码,但我仍然需要创建前两行,并将括号添加到SD列。为了使表格非常适合出版,我使用了R Markdown,knitr和kable。是否有更简单,更标准或更惯用的方式来做到这一点?
```{r Create-Table-1}
library(data.table)
library(knitr)
mtcars_dt <- data.table(mtcars)
myGroups <- c("cyl", "am")
myVariables <- c("mpg", "disp", "hp")
means_dt <- mtcars_dt[,lapply(.SD, mean), .SDcols = myVariables, by = myGroups]
means_dt.melted <- melt.data.table(means_dt, id.vars = myGroups, measure.vars = myVariables)
means_dt.melted$stat <- "mean"
sd_dt <- mtcars_dt[,lapply(.SD, sd), .SDcols=myVariables, by=myGroups]
sd_dt.melted <- melt.data.table(sd_dt, id.vars = myGroups, measure.vars = myVariables)
sd_dt.melted$stat <- "sd"
means_sd_merged_dt <- rbindlist(list(means_dt.melted, sd_dt.melted))
means_sd_dt <- dcast.data.table(means_sd_merged_dt, variable ~ cyl + am + stat, value.var = "value")
kable(means_sd_dt, digits = 2)
```
这是代码生成的表。 &#34; 8_1_mean&#34;列未正确舍入。我试过pander,但它不能添加零。
|variable | 4_0_mean| 4_0_sd| 4_1_mean| 4_1_sd| 6_0_mean| 6_0_sd| 6_1_mean| 6_1_sd| 8_0_mean| 8_0_sd| 8_1_mean| 8_1_sd|
|:--------|--------:|------:|--------:|------:|--------:|------:|--------:|------:|--------:|------:|--------:|------:|
|mpg | 22.90| 1.45| 28.07| 4.48| 19.12| 1.63| 20.57| 0.75| 15.05| 2.77| 15.4| 0.57|
|disp | 135.87| 13.97| 93.61| 20.48| 204.55| 44.74| 155.00| 8.66| 357.62| 71.82| 326.0| 35.36|
|hp | 84.67| 19.66| 81.88| 22.66| 115.25| 9.18| 131.67| 37.53| 194.17| 33.36| 299.5| 50.20|
更新: 我发布这个问题的主要原因之一是看看是否有更简单,更简单的方法来制作这种表格,使用其他库,以及编写最佳实践。
然而,chinsoon12提供了一个有效的答案,我将其纳入了我在R的第一个函数中。我在此更新,以便其他人可以修改和使用该函数。它仍然有一个我无法用数字和/或nsmall固定的错误,其中有时一个子组将比指定的数字多一个。
tabulatemsg <- function(variables, groups, input_dt, round_digits = 2, na.rm = FALSE) {
# Create a table of alternating means and (SDs), for the specified variables, with groups as columns.
require(data.table)
# Aggregate means
means_dt <- input_dt[,lapply(.SD, mean, na.rm = na.rm), .SDcols = variables, by = groups]
means_dt.melted <- melt.data.table(means_dt, id.vars = groups, measure.vars = variables)
means_dt.melted$stat <- "mean"
# Aggregate standard deviations
sd_dt <- input_dt[,lapply(.SD, sd, na.rm = na.rm), .SDcols=variables, by=groups]
sd_dt.melted <- melt.data.table(sd_dt, id.vars = groups, measure.vars = variables)
sd_dt.melted$stat <- "sd"
# Merge and cast
means_sd_merged_dt <- rbindlist(list(means_dt.melted, sd_dt.melted))
means_sd_dt <- dcast.data.table(means_sd_merged_dt, paste("variable",
paste(c(groups, "stat"), collapse=" + "), sep=" ~ "), value.var = "value")
# Ensure there are the specified number of digits after the decimal
cols <- setdiff(names(means_sd_dt), "variable")
means_sd_dt[, (cols) := lapply(.SD, format, digits=round_digits, nsmall=round_digits, justify="none"), .SDcols=cols]
means_sd_dt[, (cols) := lapply(.SD, trimws), .SDcols=cols]
# Add in parentheses
cols <- names(means_sd_dt)[seq(3, ncol(means_sd_dt), by=2)]
means_sd_dt[, (cols) := lapply(.SD, function(x) paste0("(", x, ")")), .SDcols=cols]
# Add in second row
output_table <- rbindlist(list(
data.table(t(c("", rep(c("Mean", "(SD)"), (ncol(means_sd_dt)-1)/2)))),
means_sd_dt), use.names=FALSE)
# Rename first row
setnames(output_table, colnames(output_table),
gsub("variable", "", (gsub(" sd","", (gsub(" mean", "", (gsub("_"," ", colnames(means_sd_dt)))))))))
return(output_table)
}
答案 0 :(得分:1)
您可以使用format
将每列转换为字符类,以便确保小数位后面有2位数,然后在括号中添加
#ensure there are 2 digits after decimal
cols <- setdiff(names(means_sd_dt), "variable")
means_sd_dt[, (cols) := lapply(.SD, format, digits=2, nsmall=2L, justify="none"), .SDcols=cols]
means_sd_dt[, (cols) := lapply(.SD, trimws), .SDcols=cols]
#add in parentheses
cols <- names(means_sd_dt)[seq(3, ncol(means_sd_dt), by=2)]
means_sd_dt[, (cols) := lapply(.SD, function(x) paste0("(", x, ")")), .SDcols=cols]
#add in first row
outputTbl <- rbindlist(list(
data.table(t(c("", rep(c("mean", "(SD)"), (ncol(means_sd_dt)-1)/2)))),
means_sd_dt), use.names=FALSE)
kable(outputTbl, digits = 2)