我有一些调查结果,我正在尝试做一些基本的交叉表。每列都是一种化学物质,数字0:5是它们的用途。
我试图想出一个提供频率和百分比的漂亮表格。使用table或xtabs,我能够为每列获得单独的结果,但我想找到一种方法来创建一个我能够输出到Latex中的漂亮表格包括一张桌子上的所有化学品。
感谢您提供的任何帮助。
数据框:
df <- read.table(text = "
V1 V2 V3 V4 V5 V6 V7
1 NA NA NA NA NA NA NA
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 NA
4 NA NA NA NA NA NA 5
5 0 0 0 0 0 2 0
6 NA 4 NA NA NA NA NA
7 0 0 0 0 0 0 0
8 NA NA NA NA NA 3 NA
9 NA 2 NA NA NA 3 NA
10 NA 4 NA NA NA NA NA
11 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0
14 NA NA NA NA NA 2 3
15 NA 3 NA 3 NA NA NA
16 NA 4 NA NA NA NA NA
17 0 0 0 0 0 0 0
18 NA 5 NA 5 NA NA NA
19 0 0 0 0 0 0 0
20 NA 1 NA NA NA NA NA", header = T)
所需输出(V1和V2的精确数字):
V1 V2 etc....
Freq Percent Freq Percent
No 9 100 9 56.2
Poor 0 0 1 6.2
Somewhat effective 0 0 1 6.2
Good 0 0 1 6.2
Very Good 0 0 3 18.75
NA 0 0 1 6.2
答案 0 :(得分:3)
在这里,我们使用lapply
和table
获取每列的频率。 lapply
在data.frame
环境中获取list
,然后在将列转换为table
并将其指定为factor
后,使用0:5
。使用prop.table
获取比例,cbind
Freq
和Percent
,将list
转换为data.frame
do.call(cbind
,最后重命名row.names
和colnames
res <- do.call(cbind,lapply(df, function(x) {
x1 <- table(factor(x, levels=0:5,
labels=c('No', 'Poor', 'Somewhat Effective',
'Good', 'Very Good', 'NA') ))
cbind(Freq=x1, Percent=round(100*prop.table(x1),2))}))
colnames(res) <- paste(rep(paste0('V',1:7),each=2),
colnames(res),sep=".")
head(res,2)
# V1.Freq V1.Percent V2.Freq V2.Percent V3.Freq V3.Percent V4.Freq
#No 9 100 9 56.25 9 100 9
#Poor 0 0 1 6.25 0 0 0
# V4.Percent V5.Freq V5.Percent V6.Freq V6.Percent V7.Freq V7.Percent
#No 81.82 9 100 8 66.67 8 80
#Poor 0.00 0 0 0 0.00 0 0
答案 1 :(得分:2)
我不是常规的&#34; dplyr&#34;或者&#34; tidyr&#34;用户,所以我不确定这是否是使用这些工具的最佳方法(但似乎有效):
library(dplyr)
library(tidyr)
df %>%
gather(var, val, V1:V7) %>% ## Make the data long
na.omit() %>% ## We don't need the NAs
## Factor the "value" column
mutate(val = factor(val, 0:5, c("No", "Poor", "Somewhat Effective",
"Good", "Very Good", "NA"))) %>%
group_by(val, var) %>% ## Group by val and var
summarise(Freq = n()) %>% ## Get the count
group_by(var) %>% ## Group just by var now
mutate(Pct = Freq/sum(Freq) * 100) %>% ## Calculate the percent
gather(R1, R2, Freq:Pct) %>% ## Go long again....
unite(Var, var, R1) %>% ## Combine the var and R1 cols
spread(Var, R2, fill = 0) ## Go wide....
# Source: local data frame [6 x 15]
#
# val V1_Freq V1_Pct V2_Freq V2_Pct V3_Freq V3_Pct V4_Freq
# 1 No 9 100 9 56.25 9 100 9
# 2 Poor 0 0 1 6.25 0 0 0
# 3 Somewhat Effective 0 0 1 6.25 0 0 0
# 4 Good 0 0 1 6.25 0 0 1
# 5 Very Good 0 0 3 18.75 0 0 0
# 6 NA 0 0 1 6.25 0 0 1
# Variables not shown: V4_Pct (dbl), V5_Freq (dbl), V5_Pct (dbl), V6_Freq
# (dbl), V6_Pct (dbl), V7_Freq (dbl), V7_Pct (dbl)
&#34; data.table&#34;方法在您必须经历的一系列步骤方面类似。
library(data.table)
library(reshape2)
levs <- c("No", "Poor", "Somewhat Effective", "Good", "Very Good", "NA")
DT <- melt(as.data.table(df, keep.rownames = TRUE),
id.vars = "rn", na.rm = TRUE)
DT <- DT[, value := factor(value, 0:5, levs)
][, list(Freq = .N), by = list(variable, value)
][, Pct := Freq/sum(Freq) * 100, by = list(variable)]
dcast.data.table(melt(DT, id.vars = c("variable", "value")),
value ~ variable + variable.1,
value.var = "value.1", fill = 0)
好的,还有一个......(@ akrun的一个变种&#39;答案)
library(gdata) ## For "interleave"
levs <- c("No", "Poor", "Somewhat Effective", "Good", "Very Good", "NA")
x1 <- sapply(lapply(df, factor, 0:5, levs), table)
t(interleave(t(x1), t(prop.table(x1, 2))))
### Or, skipping the transposing....
## library(SOfun) ## For "Riffle" which is like "interleave"
## Riffle(x1, prop.table(x1, 2) * 100)