我想只汇总R dataframe的数字列。我正在做以下
numeric_var <- names(df)[which(sapply(df, is.numeric))]
summary(df[,.SD, .SDcols = numeric_var])
但是,我得到以下错误
Error in `[.data.frame`(df, , .SD, .SDcols = numeric_var) :
unused argument (.SDcols = numeric_var)
我怎样才能在R?
中完成答案 0 :(得分:2)
看来,OP正在使用data.table
语法(即SDcols = ...
)但根据错误消息df
只有data.frame
类。
要使用data.table
语法,必须加载data.table
包,并且需要将df
强制转换为类data.table
。以下是一个完整的例子:
set.seed(1234L)
DF <- data.frame(a = LETTERS[1:5], b = rnorm(5), c = 1:5)
DF
# a b c
#1 A -1.2070657 1
#2 B 0.2774292 2
#3 C 1.0844412 3
#4 D -2.3456977 4
#5 E 0.4291247 5
numeric_var <- names(DF)[sapply(DF, is.numeric)]
library(data.table)
setDT(DF)[, summary(.SD), .SDcols = numeric_var]
# b c
# Min. :-2.3457 Min. :1
# 1st Qu.:-1.2071 1st Qu.:2
# Median : 0.2774 Median :3
# Mean :-0.3524 Mean :3
# 3rd Qu.: 0.4291 3rd Qu.:4
# Max. : 1.0844 Max. :5
答案 1 :(得分:1)
我们可以使用tidyverse
df %>%
select_if(is.numeric)
# col2 col3
#1: -0.5458808 0.6048889
#2: 0.5365853 0.3707349
#3: 0.4196231 0.6716903
#4: -0.5836272 0.6729823
#5: 0.8474600 0.3204306
如果我们需要在summary
tidyverse
df %>%
select_if(is.numeric) %>%
summarise_all(funs(list(summary(.))))
或以宽幅格式输出
df %>%
select_if(is.numeric) %>%
do(data.frame(lapply(., function(x) as.list(summary(x)))))
# col2.Min. col2.1st.Qu. col2.Median col2.Mean col2.3rd.Qu. col2.Max. col3.Min. col3.1st.Qu. col3.Median col3.Mean col3.3rd.Qu. col3.Max.
#1 -0.5836272 -0.5458808 0.4196231 0.1348321 0.5365853 0.84746 0.3204306 0.3707349 0.6048889 0.5281454 0.6716903 0.6729823
或使用data.table
library(data.table)
i1 <- which(unlist(lapply(df, is.numeric)))
summary
函数可以在每列上单独应用。目前还不清楚输出应该如何
setDT(df)[, unlist(lapply(.SD, summary), recursive = FALSE) , .SDcols = i1]
# col2.Min. col2.1st Qu. col2.Median col2.Mean col2.3rd Qu. col2.Max. col3.Min. col3.1st Qu. col3.Median col3.Mean col3.3rd Qu. col3.Max.
# -0.5836272 -0.5458808 0.4196231 0.1348321 0.5365853 0.8474600 0.3204306 0.3707349 0.6048889 0.5281454 0.6716903 0.6729823
set.seed(24)
df <- data.table(col1 = letters[1:5], col2 = rnorm(5), col3 = runif(5))