使用R中的因子变量有效地子集表

时间:2015-10-04 23:05:30

标签: r

我尝试根据调查数据创建表格,但我提出的解决方案对于我需要创建的所有表格都无法管理。

我对不同人群,政党及其对某些问题的看法进行了调查。下面是示例数据和我(几乎)工作繁琐的解决方案。我已经在#34; ideal.table"中找到了我正在寻找的解决方案。 data.frame(如下所示)

pop <- c("elite", "elite", "public", "public", "public", "public")
party <- c("D", "R", "R", "D", "D", "R")
opinion <- c("pro", "con", "pro", "con", "pro", "pro")

df <- data.frame(pop, party, opinion)

party.table <- prop.table(table(df[df$pop=="public",][["party"]], df[df$pop=="public",][["opinion"]]),2)
elite.table <- prop.table(table(df[df$pop=="elite",][["opinion"]]))
public.table <- prop.table(table(df[df$pop=="public",][["opinion"]]))

group <- c("R", "D", "elite", "public")
percent.pro <- c(0.3, 0.6, 0.5, 0.75)
percent.con <- c(0.7, 0.4, 0.5, 0.25)

ideal.table <- data.frame(group, percent.pro, percent.con)

library(dplyr)
library(tidyr)

# create data frames from tables
x = data.frame(elite.table)
names(x) = c("elite","value")

y = data.frame(party.table) %>% spread(Var2,Freq)
names(y)[1] = "group"

z = data.frame(public.table)
names(z)[1] = "group"

# join data frames
x %>% inner_join(y, by="group") %>% inner_join(z, by="group")

我还没有找到解决方案,但即使我找到了这个特定数据集的解决方案,有时候我会将多个表与两个维度相结合,而不是这里提供的组。是否有更好的方法来获得不同数据子集的交叉表比例?

   group percent.pro percent.con
1      R        0.30        0.70
2      D        0.60        0.40
3  elite        0.50        0.50
4 public        0.75        0.25

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

library(dplyr)
library(tidyr)
df %>%
  gather(variable, group, -opinion) %>%
  group_by(variable, group) %>%
    summarize(percent.pro = sum(opinion == "pro") / n() ) %>%
  mutate(percent.com = 1 - percent.pro)