Question

我正在使用 R 处理一些统计数据，如均值，中位数，相关性等。有没有办法以表格形式显示代码的结果？这是我的代码示例：

cor(Moisture, pH)
cor(Moisture, NH3)
cor(Moisture, NH3DM) 
cor(Moisture, NDM) 
cor(Moisture, N)
cor(Moisture, TKN)
cor(Moisture, X.Ash)
cor(Moisture, CN)
cor(Moisture, EC1.5)

目前 R 控制台上的输出为：

> cor(Moisture, pH)
[1] -0.03154892
> cor(Moisture, NH3)
[1] -0.2814583
> cor(Moisture, NH3DM) 
[1] -0.1099614
> cor(Moisture, NDM) 
[1] 0.08306996
> cor(Moisture, N)
[1] -0.3728169
> cor(Moisture, TKN)
[1] 0.06975473
> cor(Moisture, X.Ash)
[1] -0.2749583
> cor(Moisture, CN)
[1] 0.002943823
> cor(Moisture, EC1.5)
[1] -0.4049512

我觉得这很混乱，对于我的班级，我们只是想使用 R 。

有什么想法吗？

Answer 1

这是一种将单变量（例如平均值、标准差）和双变量（例如相关性）数据转换为单个对象的解决方案，然后可以将其发送到您最喜欢的降价选项（例如 kable）并制作成漂亮的摘要表。

我使用的是 mtcars 数据，因为它很容易获取并且有很多数字变量。我的示例代码使用注释来解释我一路走来的想法。 descr() 函数非常适合提供汇总统计信息，但为了可用性提取数据有点烦人。另请注意，fashion() 将相关性转换为文本值，这很好，因为此时它们用于表格而不是计算。

library(dplyr) # for pipes & select()
library(corrr) # for correlation stuff
library(summarytools) # for descr()
library(stringr)  # for str_to_title()

# load testing data
data("mtcars")
# subset data to drop the binary variables (or any others you don't want); could code as keep variables, but I have fewer here to drop
d1 <- mtcars %>% select(-vs, -am)
# sort so columns are in alphabetical order, this will help ensure data quality in the next steps
d1 <- d1[,order(colnames(d1))]

# get & store quant summary stats (summarytools function)
d2 <- descr(d1)
# save names for later
names.stats <- rownames(d2)
names.var <- colnames(d2)
# extract stored summary stats into usable dataframe
d3 <- as.vector(d2[1:7, 1:ncol(d2)])
d3 <- round(as.data.frame(matrix(d3, nrow=7, ncol=ncol(d2), byrow=FALSE)),2)
colnames(d3) <- str_to_title(names.var)

# save correlation matrix as dataframe
c1 <- d3 %>% select_if(is.numeric) %>% correlate() %>% shave(upper=TRUE) %>% fashion(leading_zeros=TRUE, decimals = 2, na_print = "—")
colnames(c1) <- paste0("Cor.", colnames(c1))

# transpose & add back summary stat names
d3 <- t(d3)
colnames(d3) <- names.stats[1:7]

# add correlations to rest of the summary stats
d3 <- cbind(d3, c1[,2:10])

对于 SO 来说，这不是最漂亮的输出，但这是您现在应该得到的。它已准备好发送到您最喜欢的 Markdown 函数中，以进行演示质量的表格制作。

       Mean Std.Dev   Min     Q1 Median     Q3    Max Cor.Carb Cor.Cyl Cor.Disp Cor.Drat Cor.Gear Cor.Hp Cor.Mpg Cor.Qsec Cor.Wt
Carb   2.81    1.62  1.00   2.00   2.00   4.00   8.00        —       —        —        —        —      —       —        —      —
Cyl    6.19    1.79  4.00   4.00   6.00   8.00   8.00     0.74       —        —        —        —      —       —        —      —
Disp 230.72  123.94 71.10 120.65 196.30 334.00 472.00     0.96    0.85        —        —        —      —       —        —      —
Drat   3.60    0.53  2.76   3.08   3.70   3.92   4.93     0.69    0.93     0.73        —        —      —       —        —      —
Gear   3.69    0.74  3.00   3.00   4.00   4.00   5.00     0.67    0.93     0.72     1.00        —      —       —        —      —
Hp   146.69   68.56 52.00  96.00 123.00 180.00 335.00     0.99    0.80     0.98     0.76     0.75      —       —        —      —
Mpg   20.09    6.03 10.40  15.35  19.20  22.80  33.90     0.91    0.92     0.93     0.92     0.91   0.95       —        —      —
Qsec  17.85    1.79 14.50  16.88  17.71  18.90  22.90     0.60    0.87     0.63     0.99     0.98   0.67    0.86        —      —
Wt     3.22    0.98  1.51   2.54   3.33   3.65   5.42     0.89    0.91     0.92     0.91     0.90   0.94    1.00     0.85      —

Answer 2

将要显示的所有数据存储在数据框df中。然后使用stargazer包。

install.packages("stargazer")
library(stargazer)
stargazer(df, align = TRUE, type = "text")

使用R在表中显示统计数据

2 个答案: