我在从R中导出一些信息时遇到了一些问题。 这些信息是一个简介,我可以使用建议linked here将大部分摘要转换为csv,这基本上只是
write.csv(numsummary$table)
但每次我使用它时,最后一列都会从csv输出中删除。
我还没有找到一种方法来获取csv输出中的最后一列,是否有人知道如何执行此操作或能够指向我可以检查的资源以了解如何操作此?
如果我能提供更多有用的信息,请告诉我,并提前感谢您的帮助!
编辑:完成最后一列示例的R脚本 - 在这种情况下,列标题为' n' - 被切断了。使用csv.write(输入$ table)似乎将最后一列留在我使用的任何类型的输出上,而不仅仅是数字摘要。
#start toothGrowth csv generation
#dataset available at https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv
toothGrowth <- read.table("ToothGrowth.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- numSummary(toothGrowth[,c("dose", "len", "X")], statistics=c("mean", "sd", "IQR", "quantiles"), quantiles=c(0,.25,.5,.75,1))
str(toothGrowth)
numSumTooth
write.csv(numSumTooth$table, file="numSumTooth.csv")
#end toothGrowth csv generation
我使用上面脚本生成的输出在pastebin sumSumTooth
上链接答案 0 :(得分:0)
缺少“n”的原因是因为该值保持为numSummaryObj$n
,而其他探索值保持为numSummaryObj$table
。
将其放回需要一个简单的cbind
或data.frame
命令:
file <- "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv"
toothGrowth <- read.table(file, header=T, sep=",", row.names=1, na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- RcmdrMisc::numSummary(toothGrowth[, c("len", "dose")])
nST <- data.frame(numSumTooth$table, numSumTooth$n)
names(nST) <- c(colnames(numSumTooth$table), "n")
write.csv(nST, "numSumTooth.csv")
==
编辑:
我会亲自投资使用dplyr
和tidyr
这样的软件包进行数据处理,因为它们将来会为您提供大量的里程和灵活性。例如,为了在data.frame中生成相同的numSummary,您可以运行以下命令:
toothGrowth %>%
select(-supp) %>%
gather(var, val) %>% #convert the wide data frame into the long-form, with var = dose and len
group_by(var) %>%
summarise(mean = mean(val), sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = n())
# A tibble: 2 × 10
var mean sd IQR `0%` `25%` `50%` `75%` `100%` n
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 dose 1.166667 0.6288722 1.5 0.5 0.500 1.00 2.000 2.0 60
2 len 18.813333 7.6493152 12.2 4.2 13.075 19.25 25.275 33.9 60
这种方法增加了灵活性,您可以选择为每个组找到平均值(在本例中为supp
):
toothGrowth %>%
# select(-supp) %>%
gather(var, val, -supp) %>%
group_by(supp, var) %>%
summarise(mean = mean(val), sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = n())
Source: local data frame [4 x 11]
Groups: supp [?]
supp var mean sd IQR `0%` `25%` `50%` `75%` `100%` n
<fctr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 OJ dose 1.166667 0.6342703 1.5 0.5 0.500 1.0 2.000 2.0 30
2 OJ len 20.663333 6.6055610 10.2 8.2 15.525 22.7 25.725 30.9 30
3 VC dose 1.166667 0.6342703 1.5 0.5 0.500 1.0 2.000 2.0 30
4 VC len 16.963333 8.2660287 11.9 4.2 11.200 16.5 23.100 33.9 30
==
另一种选择(如果你觉得重复编写长摘要语法是一件苦差事)就是创建一个函数,例如:
checkVar <- function(varname, data){
val <- data[, varname]
tmp <- data.frame(mean = mean(val),
sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = length(val))
names(tmp) <- c("mean", "sd", "IQR", "`0%`", "`25%`", "`50%`", "`75%`", "`100%`", "n")
rownames(tmp) <- varname
return(tmp)
}
执行自定义功能会为您提供摘要统计信息:
checkVar("dose", ToothGrowth)
mean sd IQR `0%` `25%` `50%` `75%` `100%` n
dose 1.166667 0.6288722 1.5 0.5 0.5 1 2 2 60
将它们放入单个data.frame中涉及一个应用函数,例如:与lapply
:
do.call(rbind, lapply(c("dose", "len"), checkVar, data=ToothGrowth))
mean sd IQR `0%` `25%` `50%` `75%` `100%` n
dose 1.166667 0.6288722 1.5 0.5 0.500 1.00 2.000 2.0 60
len 18.813333 7.6493152 12.2 4.2 13.075 19.25 25.275 33.9 60
答案 1 :(得分:0)
我遇到了同样的问题,详细说明了先前的答案
我有一个摘要
str(resumenDatos)
List of 4
$ type : num 4
$ table : num [1:514, 1:8] 3.7544 4.5779 4.135 -1.0582 -0.0789 ...
..- attr(*, "dimnames")=List of 2
.. ..$ Group : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
.. ..$ Statistic: chr [1:8] "mean" "0%" "25%" "50%" ...
$ statistics: chr [1:2] "mean" "quantiles"
$ n : num [1, 1:514] 2948 1784 1756 1306 1064 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "whatIWantToMeasure"
.. ..$ : chr [1:514] "2020_02_28_00" "2020_02_28_01" "2020_02_28_02" "2020_02_28_03" ...
- attr(*, "class")= chr "numSummary"
我创建了dateFrame,如下所示:
> resumenDatosDF <- data.frame(resumenDatos$table,t(resumenDatos$n))
> names(resumenDatosDF) <- c(colnames(resumenDatos$table), "n")
> str(resumenDatosDF)
'data.frame': 514 obs. of 9 variables:
$ mean: num 3.7544 4.5779 4.135 -1.0582 -0.0789 ...
$ 0% : num -986 -997 -995 -996 -986 -997 -996 -997 -996 -997 ...
$ 25% : num 3 3 4 13 17 15 13 3 3 3 ...
$ 50% : num 14 21 17 24 26 26 25 15 15 13 ...
$ 75% : num 24 30.2 27 28 28 ...
$ 90% : num 30 37 31 32 31.7 ...
$ 99% : num 38 49 38 40 39 ...
$ 100%: num 250 416 105 57 214 ...
$ n : num 2948 1784 1756 1306 1064 ...
> head(resumenDatosDF,10)
mean 0% 25% 50% 75% 90% 99% 100% n
2020_02_28_00 3.75440977 -986 3 14 24.00 30.0 38.00 250 2948
2020_02_28_01 4.57791480 -997 3 21 30.25 37.0 49.00 416 1784
2020_02_28_02 4.13496583 -995 4 17 27.00 31.0 38.00 105 1756
2020_02_28_03 -1.05819296 -996 13 24 28.00 32.0 40.00 57 1306
2020_02_28_04 -0.07894737 -986 17 26 28.00 31.7 39.00 214 1064
2020_02_28_05 3.26701571 -997 15 26 28.00 32.0 39.55 87 1146
2020_02_28_06 4.92619392 -996 13 25 28.00 31.0 39.00 59 1382
2020_02_28_07 1.13968101 -997 3 15 27.00 30.0 40.32 240 2069
2020_02_28_08 -1.99729973 -996 3 15 27.00 31.0 40.00 376 2222
2020_02_28_09 0.59954083 -997 3 13 23.00 33.0 41.52 1086 3049