Question

我正在努力为数据集编写一个函数，如下所示：

identifier   age   occupation        
pers1        18    student   
pers2        45    teacher   
pers3        65    retired

我想要做的是编写一个函数：

将我的变量分类为数字与因子变量
对于数值变量，给我平均值，min和mx
为因子变量，给我一个频率表
以“漂亮”格式（数据框，向量或表格）返回点（2）和（3）

到目前为止，我已经尝试过这个：

describe<- function(x) 
{ if (is.numeric(x)) { mean <- mean(x)
                   min <- min(x)
                   max <- max(x) 
                   d <- data.frame(mean, min, max)}
  else { factor <- table(x) }
}
stats <- lapply(data, describe)

问题：我的问题是，现在，“统计数据”是一个难以阅读并导出到Excel或共享的列表。我不知道如何使列表“统计数据”更易于阅读。

或者，也许有更好的方法来构建“描述”功能？

非常感谢有关如何解决这两个问题的任何想法！

Answer 1

我迟到了，但也许你还需要一个解决方案。我将一些评论的答案与您的帖子结合到以下代码中。它假定您只有数字列和因子，并按照您的指定缩放到大量列：

# Just some sample data for my example, you don't need ggplot2.
library(ggplot2)
data=diamonds

# Find which columns are numeric, and which are not.
classes = sapply(data,class)
numeric = which(classes=="numeric")
non_numeric = which(classes!="numeric")

# create the summary objects    
summ_numeric = summary(data[,numeric])
summ_non_numeric = summary(data[,non_numeric])

# result is easily written to csv
write.csv(summ_non_numeric,file="test.csv")

希望这有帮助。

Answer 2

所需功能已在其他地方提供，因此如果您对自己编码不感兴趣，那么您可以使用它。 Publish包可用于生成表格以便在论文中呈现。它不在CRAN上，但你可以从github安装它

devtools::install_github('tagteam/Publish')
library(Publish)
library(isdals)  # Get some data
data(fev)        
fev$Smoke <- factor(fev$Smoke, levels=0:1, labels=c("No", "Yes"))
fev$Gender <- factor(fev$Gender, levels=0:1, labels=c("Girl", "Boy"))

univariateTable可以生成一个表示数据的发布就绪表。默认情况下，univariateTable计算数值变量的均值和标准差以及因子类别中观察值的分布。可以跨组计算和比较这些值。 univariateTable的主要输入是一个公式，其中右侧列出了要包含在表中的变量，而左侧（如果存在）指定了一个分组变量。

univariateTable(Smoke ~ Age + Ht + FEV + Gender, data=fev)

这会产生以下输出

  Variable     Level No (n=589) Yes (n=65) Total (n=654) p-value
1      Age mean (sd)  9.5 (2.7) 13.5 (2.3)     9.9 (3.0)  <1e-04
2       Ht mean (sd) 60.6 (5.7) 66.0 (3.2)    61.1 (5.7)  <1e-04
3      FEV mean (sd)  2.6 (0.9)  3.3 (0.7)     2.6 (0.9)  <1e-04
4   Gender      Girl 279 (47.4)  39 (60.0)    318 (48.6)        
5                Boy 310 (52.6)  26 (40.0)    336 (51.4)  0.0714

在R

2 个答案: