我习惯了SPSS,我真的很喜欢在那里使用自定义表格进行调查数据报告。如果我能在R中做类似的事情,我真的很乐意。
我想要做的是一个包含多行和多列的表,其中列百分比和计数(N - 百分比基数)
以下是调查数据的示例代码:
set.seed(321)
ID <- seq(1:200)
Age <- sample(c("18-34", "35-59"), 200, replace = TRUE)
Sex <- sample(c("Male", "Female"), 200, replace = TRUE)
TOTAL <- rep(c("TOTAL"), 200)
Edu <- sample(c("Lower", "Middle", "Higher"), 200, replace = TRUE)
PurchaseInt <- sample(c("Definitely yes", "Somewhat yes", "Somewhat not", "Definitely not"),200, replace=TRUE)
Relevance <- sample(c("Definitely fits my needs", "Somewhat fits my needs", "Somewhat does not fit", "Definitely does not fit"),200, replace=TRUE)
DF <- data.frame(ID,TOTAL,Sex,Age,Edu,PurchaseInt,Relevance)
head(DF)
ID TOTAL Sex Age Edu PurchaseInt Relevance
1 1 TOTAL Male 35-59 Lower Definitely yes Somewhat fits my needs
2 2 TOTAL Male 35-59 Higher Somewhat not Definitely does not fit
3 3 TOTAL Male 18-34 Higher Definitely yes Somewhat does not fit
4 4 TOTAL Female 18-34 Lower Somewhat not Definitely does not fit
5 5 TOTAL Female 18-34 Higher Definitely yes Somewhat does not fit
6 6 TOTAL Female 18-34 Higher Definitely not Definitely does not fit
# Simple table, 1 variable by 1 variable, no N (BASE) BAD TABLE :(
prop.table(table(DF$PurchaseInt, DF$Sex),2)
Female Male
Definitely not 0.28 0.30
Definitely yes 0.25 0.28
Somewhat not 0.29 0.24
Somewhat yes 0.17 0.18
我真正喜欢的是这样的事情(从SPSS完成):
我意识到将计数与col百分比相结合可能会特别棘手。对我来说至关重要的是找到在一个表(尤其是多列)中报告多个行和列的可能性,因为这有助于数据分析A LOT。
答案 0 :(得分:1)
这部分的两个部分:首先,您要创建表,然后要报告它。你分成不同的边距然后把它放在一张桌子里,这有点奇怪;我也不确定你是如何获得这些数字的,它们是否意味着列数百分比?如果是这样的话,我会用你的随机种子得到不同的。
无论如何,这是第1部分,它可以为您提供数据。
# a useful function
table_by <- function(row_var, col_var = NULL) {
# the repeated t() below ensures you have a 4 x 1 matrix
tbl <- if (is.null(col_var)) t(t(table(DF[[row_var]]))) else table(DF[[row_var]], DF[[col_var]])
tbl <- prop.table(tbl, 2)
tbl <- round(tbl, 2) * 100
tbl
}
col12 <- rbind(table_by("PurchaseInt", "Sex"), table_by("Relevance", "Sex"))
col34 <- rbind(table_by("PurchaseInt", "Age"), table_by("Relevance", "Age"))
col56 <- rbind(table_by("PurchaseInt", "Edu"), table_by("Relevance", "Edu"))
percent_rows <- cbind(col12, col34, col56)
whole_table <- cbind(
rbind(table_by("PurchaseInt"), table_by("Relevance")),
percent_rows
)
# should be the data you want
whole_table
对于第二部分,您可以使用我的huxtable
包 - 还有其他包:
library(huxtable)
wt_hux <- as_hux(whole_table, add_colnames = TRUE, add_rownames = TRUE)
number_format(wt_hux)[-2,] <- "%.0f%%"
number_format(wt_hux)[2,] <- "%.0f"
wt_hux[1, 1:2] <- c("", "Total")
wt_hux[2, 1] <- "Total"
wt_hux <- insert_row(wt_hux, c("", "Total", "Sex", "", "Age", "", "Edu", "", ""))
colspan(wt_hux)[1, c(3, 5, 7)] <- c(2, 2, 3)
align(wt_hux)[1, c(3, 5, 7)] <- "center"
wt_hux <- insert_column(wt_hux, c("", "", "Total", "PurchaseInt", "", "", "", "Relevance", "", "", ""))
rowspan(wt_hux)[c(4, 8), 1] <- 4
bottom_border(wt_hux)[c(1, 6, 10), ] <- 1 # for example
# should look roughly the way you want. You can print it to PDF or HTML:
wt_hux
答案 1 :(得分:0)
非常感谢您的回答!很有帮助。我没有遇到过huxtable。
通过一些修改,我已经按照我想要的方式工作了。这是代码:
# a useful function
table_by <- function(row_var, col_var = NULL) {
# the repeated t() below ensures you have a 4 x 1 matrix
tbl <- if (is.null(col_var)) t(t(table(DF[[row_var]]))) else table(DF[[row_var]], DF[[col_var]])
tbl <- prop.table(tbl, 2)
tbl <- round(tbl, 2) * 100
tbl
}
# HERE I also added a table showing counts for demographics
col12 <- rbind(table(DF$TOTAL), table_by("PurchaseInt", "TOTAL"), table_by("Relevance", "TOTAL"))
col34 <- rbind(table(DF$Sex), table_by("PurchaseInt", "Sex"), table_by("Relevance", "Sex"))
col56 <- rbind(table(DF$Age), table_by("PurchaseInt", "Age"), table_by("Relevance", "Age"))
col78 <- rbind(table(DF$Edu), table_by("PurchaseInt", "Edu"), table_by("Relevance", "Edu"))
# should be the data you want
whole_table <- cbind(col12, col34, col56,col78)
whole_table
library(huxtable)
wt_hux <- as_hux(whole_table, add_colnames = TRUE, add_rownames = TRUE)
number_format(wt_hux)[-2,] <- "%.0f%%"
number_format(wt_hux)[2,] <- "%.0f"
wt_hux[1, 1:2] <- c("", "Total")
wt_hux[2, 1] <- "Total"
wt_hux <- insert_row(wt_hux, c("", "Total", "Sex", "", "Age", "", "Edu", "", ""))
colspan(wt_hux)[1, c(3, 5, 7)] <- c(2, 2, 3)
align(wt_hux)[1, c(3, 5, 7)] <- "center"
wt_hux <- insert_column(wt_hux, c("", "", "Total", "PurchaseInt", "", "", "", "Relevance", "", "", ""))
rowspan(wt_hux)[c(4, 8), 1] <- 4
bottom_border(wt_hux)[c(2,3, 7), ] <- 1 # for example
# should look roughly the way you want. You can print it to PDF or HTML:
wt_hux
想知道这是否可以包装成一个函数。我还不是很擅长编写R函数,但我是一个内容的傻瓜(因为我可能需要很多这样的表,它们的行和列不同,因为这只是一个例子)。
干杯, Grzesiek。