R - 将Numsummary导出到csv

时间:2017-04-25 07:53:54

标签: r csv

我在从R中导出一些信息时遇到了一些问题。 这些信息是一个简介,我可以使用建议linked here将大部分摘要转换为csv,这基本上只是

write.csv(numsummary$table)

但每次我使用它时,最后一列都会从csv输出中删除。

我还没有找到一种方法来获取csv输出中的最后一列,是否有人知道如何执行此操作或能够指向我可以检查的资源以了解如何操作此?

如果我能提供更多有用的信息,请告诉我,并提前感谢您的帮助!

编辑:完成最后一列示例的R脚本 - 在这种情况下,列标题为' n' - 被切断了。使用csv.write(输入$ table)似乎将最后一列留在我使用的任何类型的输出上,而不仅仅是数字摘要。

#start toothGrowth csv generation
#dataset available at https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv

toothGrowth <- read.table("ToothGrowth.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- numSummary(toothGrowth[,c("dose", "len", "X")], statistics=c("mean", "sd", "IQR", "quantiles"), quantiles=c(0,.25,.5,.75,1))
str(toothGrowth)
numSumTooth
write.csv(numSumTooth$table, file="numSumTooth.csv")

#end toothGrowth csv generation

我使用上面脚本生成的输出在pastebin sumSumTooth

上链接

2 个答案:

答案 0 :(得分:0)

缺少“n”的原因是因为该值保持为numSummaryObj$n,而其他探索值保持为numSummaryObj$table

将其放回需要一个简单的cbinddata.frame命令:

file <- "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv"
toothGrowth  <- read.table(file, header=T, sep=",", row.names=1, na.strings="NA", dec=".", strip.white=TRUE)

numSumTooth <- RcmdrMisc::numSummary(toothGrowth[, c("len", "dose")])

nST <- data.frame(numSumTooth$table, numSumTooth$n)
names(nST) <- c(colnames(numSumTooth$table), "n")

write.csv(nST, "numSumTooth.csv")

==

编辑:

我会亲自投资使用dplyrtidyr这样的软件包进行数据处理,因为它们将来会为您提供大量的里程和灵活性。例如,为了在data.frame中生成相同的numSummary,您可以运行以下命令:

toothGrowth %>% 
  select(-supp) %>% 
  gather(var, val) %>% #convert the wide data frame into the long-form, with var = dose and len
  group_by(var) %>% 
  summarise(mean = mean(val), sd = sd(val),
            IQR = IQR(val),
            `0%`= min(val),
            `25%` = quantile(val, 0.25),
            `50%` = median(val),
            `75%` = quantile(val, .75),
            `100%` = max(val),
            n = n())


# A tibble: 2 × 10
    var      mean        sd   IQR  `0%`  `25%` `50%`  `75%` `100%`     n
  <chr>     <dbl>     <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl> <int>
1  dose  1.166667 0.6288722   1.5   0.5  0.500  1.00  2.000    2.0    60
2   len 18.813333 7.6493152  12.2   4.2 13.075 19.25 25.275   33.9    60  

这种方法增加了灵活性,您可以选择为每个组找到平均值(在本例中为supp):

toothGrowth %>% 
#  select(-supp) %>% 
  gather(var, val, -supp) %>% 
  group_by(supp, var) %>% 
  summarise(mean = mean(val), sd = sd(val),
            IQR = IQR(val),
            `0%`= min(val),
            `25%` = quantile(val, 0.25),
            `50%` = median(val),
            `75%` = quantile(val, .75),
            `100%` = max(val),
            n = n())


Source: local data frame [4 x 11]
Groups: supp [?]

    supp   var      mean        sd   IQR  `0%`  `25%` `50%`  `75%` `100%`     n
   <fctr> <chr>     <dbl>     <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl> <int>
 1     OJ  dose  1.166667 0.6342703   1.5   0.5  0.500   1.0  2.000    2.0    30
 2     OJ   len 20.663333 6.6055610  10.2   8.2 15.525  22.7 25.725   30.9    30
 3     VC  dose  1.166667 0.6342703   1.5   0.5  0.500   1.0  2.000    2.0    30
 4     VC   len 16.963333 8.2660287  11.9   4.2 11.200  16.5 23.100   33.9    30

==

另一种选择(如果你觉得重复编写长摘要语法是一件苦差事)就是创建一个函数,例如:

checkVar <- function(varname, data){
  val <- data[, varname]
  tmp <- data.frame(mean = mean(val), 
                    sd = sd(val),
                    IQR = IQR(val),
                    `0%`= min(val),
                    `25%` = quantile(val, 0.25),
                    `50%` = median(val),
                    `75%` = quantile(val, .75),
                    `100%` = max(val),
                    n = length(val)) 
  names(tmp) <- c("mean", "sd", "IQR", "`0%`", "`25%`", "`50%`", "`75%`", "`100%`", "n")
  rownames(tmp) <- varname
  return(tmp)
} 

执行自定义功能会为您提供摘要统计信息:

checkVar("dose", ToothGrowth)


         mean        sd IQR `0%` `25%` `50%` `75%` `100%`  n
dose 1.166667 0.6288722 1.5  0.5   0.5     1     2      2 60

将它们放入单个data.frame中涉及一个应用函数,例如:与lapply

do.call(rbind, lapply(c("dose", "len"), checkVar, data=ToothGrowth))


          mean        sd  IQR `0%`  `25%` `50%`  `75%` `100%`  n
dose  1.166667 0.6288722  1.5  0.5  0.500  1.00  2.000    2.0 60
len  18.813333 7.6493152 12.2  4.2 13.075 19.25 25.275   33.9 60

答案 1 :(得分:0)

我遇到了同样的问题,详细说明了先前的答案

我有一个摘要

str(resumenDatos)

List of 4
 $ type      : num 4
 $ table     : num [1:514, 1:8] 3.7544 4.5779 4.135 -1.0582 -0.0789 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ Group    : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
  .. ..$ Statistic: chr [1:8] "mean" "0%" "25%" "50%" ...
 $ statistics: chr [1:2] "mean" "quantiles"
 $ n         : num [1, 1:514] 2948 1784 1756 1306 1064 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr "whatIWantToMeasure"
  .. ..$ : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
 - attr(*, "class")= chr "numSummary"

我创建了dateFrame,如下所示:

> resumenDatosDF <- data.frame(resumenDatos$table,t(resumenDatos$n))

> names(resumenDatosDF) <- c(colnames(resumenDatos$table), "n")

> str(resumenDatosDF)
'data.frame':   514 obs. of  9 variables:
 $ mean: num  3.7544 4.5779 4.135 -1.0582 -0.0789 ...
 $ 0%  : num  -986 -997 -995 -996 -986 -997 -996 -997 -996 -997 ...
 $ 25% : num  3 3 4 13 17 15 13 3 3 3 ...
 $ 50% : num  14 21 17 24 26 26 25 15 15 13 ...
 $ 75% : num  24 30.2 27 28 28 ...
 $ 90% : num  30 37 31 32 31.7 ...
 $ 99% : num  38 49 38 40 39 ...
 $ 100%: num  250 416 105 57 214 ...
 $ n   : num  2948 1784 1756 1306 1064 ...

> head(resumenDatosDF,10)
                     mean   0% 25% 50%   75%  90%   99% 100%    n
2020_02_28_00  3.75440977 -986   3  14 24.00 30.0 38.00  250 2948
2020_02_28_01  4.57791480 -997   3  21 30.25 37.0 49.00  416 1784
2020_02_28_02  4.13496583 -995   4  17 27.00 31.0 38.00  105 1756
2020_02_28_03 -1.05819296 -996  13  24 28.00 32.0 40.00   57 1306
2020_02_28_04 -0.07894737 -986  17  26 28.00 31.7 39.00  214 1064
2020_02_28_05  3.26701571 -997  15  26 28.00 32.0 39.55   87 1146
2020_02_28_06  4.92619392 -996  13  25 28.00 31.0 39.00   59 1382
2020_02_28_07  1.13968101 -997   3  15 27.00 30.0 40.32  240 2069
2020_02_28_08 -1.99729973 -996   3  15 27.00 31.0 40.00  376 2222
2020_02_28_09  0.59954083 -997   3  13 23.00 33.0 41.52 1086 3049