我是R的新手。我正在编写一份关于我工作的常用功能/功能的语法的单独手册。我的示例数据框如下:
x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L,
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all",
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L,
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral",
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"),
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt",
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B",
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")
> x.sample
Q9_A Q9_B Q9_C
1 Very Impt Very Impt Not Impt at all
2 Not Impt at all Very Impt Very Impt
3 Very Impt Very Impt Very Impt
4 Not Impt at all Not Impt at all Not Impt at all
5 Very Impt Very Impt Very Impt
6 Not Impt at all Very Impt Very Impt
7 Impt Not Impt at all Not Impt at all
8 Very Impt Very Impt Very Impt
9 Very Impt Not Impt at all Very Impt
10 Very Impt Not Impt at all Not Impt at all
我的原始数据框有21列。
如果我想找到平均值(将其视为有序变量):
> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C
4.0 4.2 4.2
我想将数据框中所有变量的频率表制成表格。我搜索了互联网和许多论坛,并看到最近的命令是使用sapply。但是当我这样做时,它给了所有0。
> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
Q9_A Q9_B Q9_C
Not Impt at all 0 0 0
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 0 0 0
Very Impt 0 0 0
问 对于数据框中的所有列(即因子),如何使用sapply按照上表将频率表制成表格?
PS很抱歉,如果这似乎是琐事,但我搜索了2天没有答案,并尝试所有可能的组合。也许我没有足够的搜索=(
非常感谢。
答案 0 :(得分:8)
你快到了。只需对你的功能进行一次小改动就能让你在那里。 x
中的function(x) ...
需要传递到table()
来电:
levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))
对代码进行一些重新设计可能会使它更易于阅读:
sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)
# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6
答案 1 :(得分:8)
迟到了,但这里有一个reshape2
可能的解决方案。使用recast
可能非常简单,但我们需要在此处理空系数级别,因此我们需要在factorsAsStrings = FALSE
和melt
中指定drop = FALSE
dcast
,虽然recast
无法将参数传递给melt
(仅限dcast
),所以此处
library(reshape2)
x.sample$indx <- 1
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Neutral 0 0 0
# 3 Not Impt at all 3 4 4
# 4 Somewhat Impt 0 0 0
# 5 Very Impt 6 6 6
如果我们不关心空白水平,那么快速解决方案就是
recast(x.sample, value ~ variable, id.var = "indx")
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Not Impt at all 3 4 4
# 3 Very Impt 6 6 6
或者,如果速度是一个问题,我们可以使用data.atble
library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE),
value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1: Impt 1 0 0
# 2: Neutral 0 0 0
# 3: Not Impt at all 3 4 4
# 4: Somewhat Impt 0 0 0
# 5: Very Impt 6 6 6
答案 2 :(得分:5)
为什么不呢:
> sapply(x.sample, table)
Q9_A Q9_B Q9_C
Impt 1 0 0
Neutral 0 0 0
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Very Impt 6 6 6
我们称之为'tbl';
tbl[ order(match(rownames(tbl), c("Not Impt at all", "Somewhat Impt",
"Neutral", "Impt", "Very Impt")) ) , ]
Q9_A Q9_B Q9_C
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 1 0 0
Very Impt 6 6 6