我想使用rmarkdown创建一个表格,其中每个单元格都有两个值,例如3.1 (0.05)
或78 ± 23.3
。这些类型的表在科学文献(like ones with bold values)中非常常见,我们希望紧凑地显示平均值和标准差,或者加值减去一些错误项。因此,在使用Rmarkdown时,有一种简单的方法来生成它们会很有用。例如:
# my table
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
[snipped]
# my other table, that I want to combine with the first
some_error_term_for_mtcars <- data.frame(sapply(1:ncol(mtcars), function(i) sample(x = (min(mtcars[, i])/10):max(mtcars[, i])/10, nrow(mtcars), replace = TRUE)))
some_error_term_for_mtcars
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1 2.704 0.44 26.011 3.92 0.4276 0.21513 1.145 0.0 0.0 0.03 0.41
2 0.604 0.44 5.211 6.32 0.0276 0.01513 1.345 0.1 0.1 0.33 0.21
3 3.304 0.14 31.511 20.42 0.1276 0.51513 0.145 0.1 0.0 0.43 0.71
4 1.004 0.44 16.011 26.02 0.2276 0.11513 1.345 0.1 0.0 0.03 0.31
5 2.604 0.34 4.311 30.02 0.0276 0.31513 1.745 0.1 0.1 0.23 0.41
6 2.404 0.64 8.011 27.92 0.1276 0.21513 1.145 0.0 0.1 0.33 0.41
7 2.804 0.14 4.811 14.92 0.1276 0.01513 0.345 0.1 0.0 0.13 0.31
[snipped]
在rmarkdown中组合这两个表以生成单个单元格可以包含21 (0.904)
或21 ± 0.904
等内容的最简单方法是什么?
答案 0 :(得分:4)
我们可以这样做,然后使用knitr::kable
来获得降价:
two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " (", some_error_term_for_mtcars[ , i], ")" ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
mpg cyl disp hp drat wt qsec vs
1 21 (2.704) 6 (0.44) 160 (26.011) 110 (3.92) 3.9 (0.4276) 2.62 (0.21513) 16.46 (1.145) 0 (0)
2 21 (0.604) 6 (0.44) 160 (5.211) 110 (6.32) 3.9 (0.0276) 2.875 (0.01513) 17.02 (1.345) 0 (0.1)
3 22.8 (3.304) 4 (0.14) 108 (31.511) 93 (20.42) 3.85 (0.1276) 2.32 (0.51513) 18.61 (0.145) 1 (0.1)
4 21.4 (1.004) 6 (0.44) 258 (16.011) 110 (26.02) 3.08 (0.2276) 3.215 (0.11513) 19.44 (1.345) 1 (0.1)
5 18.7 (2.604) 8 (0.34) 360 (4.311) 175 (30.02) 3.15 (0.0276) 3.44 (0.31513) 17.02 (1.745) 0 (0.1)
6 18.1 (2.404) 6 (0.64) 225 (8.011) 105 (27.92) 2.76 (0.1276) 3.46 (0.21513) 20.22 (1.145) 1 (0)
am gear carb
1 1 (0) 4 (0.03) 4 (0.41)
2 1 (0.1) 4 (0.33) 4 (0.21)
3 1 (0) 4 (0.43) 1 (0.71)
4 0 (0) 3 (0.03) 1 (0.31)
5 0 (0.1) 3 (0.23) 2 (0.41)
6 0 (0.1) 3 (0.33) 1 (0.41)
knitr::kable(head(two_tables_into_one))
或正负分隔符:
two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " ± ", some_error_term_for_mtcars[ , i] ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
mpg cyl disp hp
1 21 ± 2.704 6 ± 0.44 160 ± 26.011 110 ± 3.92
2 21 ± 0.604 6 ± 0.44 160 ± 5.211 110 ± 6.32
3 22.8 ± 3.304 4 ± 0.14 108 ± 31.511 93 ± 20.42
4 21.4 ± 1.004 6 ± 0.44 258 ± 16.011 110 ± 26.02
5 18.7 ± 2.604 8 ± 0.34 360 ± 4.311 175 ± 30.02
6 18.1 ± 2.404 6 ± 0.64 225 ± 8.011 105 ± 27.92
drat wt qsec
1 3.9 ± 0.4276 2.62 ± 0.21513 16.46 ± 1.145
2 3.9 ± 0.0276 2.875 ± 0.01513 17.02 ± 1.345
3 3.85 ± 0.1276 2.32 ± 0.51513 18.61 ± 0.145
4 3.08 ± 0.2276 3.215 ± 0.11513 19.44 ± 1.345
5 3.15 ± 0.0276 3.44 ± 0.31513 17.02 ± 1.745
6 2.76 ± 0.1276 3.46 ± 0.21513 20.22 ± 1.145
vs am gear carb
1 0 ± 0 1 ± 0 4 ± 0.03 4 ± 0.41
2 0 ± 0.1 1 ± 0.1 4 ± 0.33 4 ± 0.21
3 1 ± 0.1 1 ± 0 4 ± 0.43 1 ± 0.71
4 1 ± 0.1 0 ± 0 3 ± 0.03 1 ± 0.31
5 0 ± 0.1 0 ± 0.1 3 ± 0.23 2 ± 0.41
6 1 ± 0 0 ± 0.1 3 ± 0.33 1 ± 0.41
knitr::kable(head(two_tables_into_one))
但这as.data.frame(do.call(cbind, lapply...
似乎有点尴尬。有更简洁的方式吗?
答案 1 :(得分:3)
以下解决方案基于 this 来回答这个问题。
本质上,问题需要分解为两部分:第一,如何组合两个表格,第二,如何将结果呈现为 HTML 等。
让我们用两个数据框演示解决方案:
my_mtcars
df_random_vals
library(tibble)
library(dplyr, warn.conflicts = FALSE)
## part 1 -- create `my_mtcars`
##############################
my_mtcars <-
mtcars %>%
rownames_to_column("cars") %>%
as_tibble()
## part 2 -- create `df_random_vals` based on `my_mtcars` dimensions
####################################################################
dim_my_mtcars <- dim(my_mtcars)
target_nrows <- dim_my_mtcars[1]
target_ncols <- dim_my_mtcars[2]
set.seed(2021)
my_mat <-
matrix(data = rnorm(target_nrows*target_ncols),
ncol = target_ncols,
nrow = target_nrows)
df_random_vals <-
my_mat %>%
as.data.frame() %>%
as_tibble() %>%
mutate(across(everything(), round, 3)) ## just so we have shorter decimals
## part 3 -- test `my_mtcars` and `df_random_vals` are of the same dimensions as intended
#########################################################################################
identical(
dim(df_random_vals),
dim(my_mtcars)
)
#> [1] TRUE
## part 4 -- set a general custom function for how to paste values together
#################################################################
my_paste <- function(x, y) {
paste0(x, " ± ", y)
}
my_paste(1, 2)
#> [1] "1 ± 2"
## part 5 -- join the datasets
##############################
library(purrr)
output <- map2_dfr(my_mtcars, df_random_vals, my_paste) # https://stackoverflow.com/a/68541960/6105259
output
#> # A tibble: 32 x 12
#> cars mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Mazda ~ 21 ± ~ 6 ± 0~ 160 ±~ 110 ±~ 3.9 ~ 2.62~ 16.4~ 0 ± ~ 1 ± ~ 4 ± ~ 4 ± ~
#> 2 Mazda ~ 21 ± ~ 6 ± -~ 160 ±~ 110 ±~ 3.9 ~ 2.87~ 17.0~ 0 ± ~ 1 ± ~ 4 ± ~ 4 ± ~
#> 3 Datsun~ 22.8 ~ 4 ± 0~ 108 ±~ 93 ± ~ 3.85~ 2.32~ 18.6~ 1 ± ~ 1 ± ~ 4 ± ~ 1 ± ~
#> 4 Hornet~ 21.4 ~ 6 ± 1~ 258 ±~ 110 ±~ 3.08~ 3.21~ 19.4~ 1 ± ~ 0 ± ~ 3 ± ~ 1 ± ~
#> 5 Hornet~ 18.7 ~ 8 ± -~ 360 ±~ 175 ±~ 3.15~ 3.44~ 17.0~ 0 ± ~ 0 ± ~ 3 ± ~ 2 ± ~
#> 6 Valian~ 18.1 ~ 6 ± -~ 225 ±~ 105 ±~ 2.76~ 3.46~ 20.2~ 1 ± ~ 0 ± ~ 3 ± ~ 1 ± ~
#> 7 Duster~ 14.3 ~ 8 ± -~ 360 ±~ 245 ±~ 3.21~ 3.57~ 15.8~ 0 ± ~ 0 ± ~ 3 ± 0 4 ± ~
#> 8 Merc 2~ 24.4 ~ 4 ± -~ 146.7~ 62 ± ~ 3.69~ 3.19~ 20 ±~ 1 ± ~ 0 ± ~ 4 ± ~ 2 ± ~
#> 9 Merc 2~ 22.8 ~ 4 ± -~ 140.8~ 95 ± ~ 3.92~ 3.15~ 22.9~ 1 ± ~ 0 ± ~ 4 ± ~ 2 ± ~
#> 10 Merc 2~ 19.2 ~ 6 ± 1~ 167.6~ 123 ±~ 3.92~ 3.44~ 18.3~ 1 ± ~ 0 ± ~ 4 ± ~ 4 ± ~
#> # ... with 22 more rows
由 spin-off question I asked (v2.0.0) 于 2021 年 7 月 27 日创建
最后,我们渲染:
library(kableExtra)
## based on this one: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html#Bootstrap_theme
output %>%
kbl() %>%
kable_styling()
场景
我们有两个相互对应的数据框。也就是说,它们具有相同的维度,并且相同位置的单元格中的值在表之间具有内在关联。一个例子是一张带有均值的表,另一张表带有每个均值的标准误差。
想要的操作
我们想要一张表格而不是两张表格,它将各个单元格组合在一起(也称为“连接”)一个单元格。
所需的输出
我们希望将单个输出表呈现为 HTML。
必要的代码
my_paste <- function(x, y) {
paste0(x, " ± ", y)
}
output <- map2_dfr(df1, df2, my_paste)
output %>%
kbl() %>%
kable_styling()
答案 2 :(得分:2)
我在summarytools包中使用了以下技术(您可以查看descr()
和print.summarytools()
的源代码以获取所有详细信息。)
> install.packages("devtools")
> library(devtools)
> install_github('dcomtois/summarytools')
> library(summarytools)
> obs <- descr(iris)$observ
> obs
Sepal.Length Sepal.Width Petal.Length Petal.Width
Valid "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"
<NA> "0 (0%)" "0 (0%)" "0 (0%)" "0 (0%)"
Total "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"
$observ
数据帧是以这种方式构造的 - 它是更大循环的一部分,因此是i
迭代器。请注意,数据帧稍后将在代码中进行转换。
output$observ[i,] <- c(paste0(n.valid, " (", p.valid, "%)"),
paste0(n.NA, " (", p.NA, "%)"),
paste(n.valid + n.NA, "(100%)"))
然后使用pander生成 rmarkdown 表格,我们可以这样做:
> library(pander)
> pander(x = obs, style="rmarkdown")
| | Sepal.Length | Sepal.Width | Petal.Length |
|:-----------:|:--------------:|:-------------:|:--------------:|
| **Valid** | 150 (100%) | 150 (100%) | 150 (100%) |
| **<NA>** | 0 (0%) | 0 (0%) | 0 (0%) |
| **Total** | 150 (100%) | 150 (100%) | 150 (100%) |
Table: Table continues below
| | Petal.Width |
|:-----------:|:-------------:|
| **Valid** | 150 (100%) |
| **<NA>** | 0 (0%) |
| **Total** | 150 (100%) |
以下是descr()
函数的完整输出:
> descr(iris, style = "rmarkdown", plain.ascii = FALSE)
Non-numerical variable(s) ignored: Species
Descriptive Statistics
Dataframe: iris
| | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
|------------------:|---------------:|--------------:|---------------:|--------------:|
| **Mean** | 5.84 | 3.06 | 3.76 | 1.2 |
| **Std.Dev** | 0.83 | 0.44 | 1.77 | 0.76 |
| **Min** | 4.3 | 2 | 1 | 0.1 |
| **Max** | 7.9 | 4.4 | 6.9 | 2.5 |
| **Median** | 5.8 | 3 | 4.35 | 1.3 |
| **mad** | 1.04 | 0.44 | 1.85 | 1.04 |
| **IQR** | 1.3 | 0.5 | 3.5 | 1.5 |
| **CV** | 7.06 | 7.01 | 2.13 | 1.57 |
| **Skewness** | 0.31 | 0.31 | -0.27 | -0.1 |
| **SE.Skewness** | 0.2 | 0.2 | 0.2 | 0.2 |
| **Kurtosis** | -0.61 | 0.14 | -1.42 | -1.36 |
Observations
| | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
|------------:|---------------:|--------------:|---------------:|--------------:|
| **Valid** | 150 (100%) | 150 (100%) | 150 (100%) | 150 (100%) |
| **<NA>** | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| **Total** | 150 (100%) | 150 (100%) | 150 (100%) | 150 (100%) |
现在,对于来自2个不同数据集的组合数据,一个好的旧for
循环可以很好地完成这项任务:
names(some_error_term_for_mtcars) <- names(mtcars)
new.df <- mtcars
for (n in names(mtcars)) {
new.df[,n] <- paste(mtcars[,n], "±",round(some_error_term_for_mtcars[,n],2))
}
pander(new.df, style="rmarkdown")
部分输出:
| | mpg | cyl | disp |
|:-------------------------:|:----------:|:--------:|:-------------:|
| **Mazda RX4** | 21 ± 2 | 6 ± 0.04 | 160 ± 33.61 |
| **Mazda RX4 Wag** | 21 ± 0.8 | 6 ± 0.14 | 160 ± 26.11 |
| **Datsun 710** | 22.8 ± 0.1 | 4 ± 0.64 | 108 ± 45.81 |
| **Hornet 4 Drive** | 21.4 ± 1.7 | 6 ± 0.04 | 258 ± 33.81 |
| **Hornet Sportabout** | 18.7 ± 2.7 | 8 ± 0.54 | 360 ± 37.81 |
| **Valiant** | 18.1 ± 3.3 | 6 ± 0.14 | 225 ± 36.31 |
| **Duster 360** | 14.3 ± 0.1 | 8 ± 0.24 | 360 ± 2.01 |
| **Merc 240D** | 24.4 ± 2.3 | 4 ± 0.14 | 146.7 ± 8.81 |
| **Merc 230** | 22.8 ± 1.7 | 4 ± 0.04 | 140.8 ± 43.91 |
| **Merc 280** | 19.2 ± 1.5 | 6 ± 0.24 | 167.6 ± 6.91 |
| **Merc 280C** | 17.8 ± 3 | 6 ± 0.14 | 167.6 ± 27.11 |
| **Merc 450SE** | 16.4 ± 3 | 8 ± 0.34 | 275.8 ± 11.21 |
| **Merc 450SL** | 17.3 ± 2.8 | 8 ± 0.14 | 275.8 ± 32.21 |
| **Merc 450SLC** | 15.2 ± 0.3 | 8 ± 0.44 | 275.8 ± 11.61 |