可重复的例子:
library(ggplot2)
library(haven)
library(dplyr)
data <- read_dta('http://dl.dropboxusercontent.com/s/s7zqb2e0avyp1gk/nswd_old_12.dta')
data$treatedOrNonSample <- ifelse(data$sample == 1, 0, ifelse(data$treat == 1, 1, 2))
treatOrCPS <- subset(data, data$treatedOrNonSample!=2)
m_ps <- glm(treat ~ age + age2 + ed + hisp + married + nodeg + re74 + re75 + black,
family=binomial(link="probit"), data=treatOrCPS)
prs_df <- data.frame(pr_score = predict(m_ps, type = "response"),
treated = m_ps$model$treat)
labs <- paste("Status:", c("Treated", "CPS Sample"))
prs_df %>%
mutate(treated = ifelse(treated == 1, labs[1], labs[2])) %>%
ggplot(aes(x = pr_score)) +
geom_histogram(color = "white") +
facet_wrap(~treated) +
xlab("Probability of being in Treatment Condition") +
theme_bw()
我得到以下内容:
为什么最右边的条件没有数据?我在这个看似微不足道的话题上失去了理智,所以任何帮助都会受到赞赏。
答案 0 :(得分:1)
删除color=white
会使数据可见:
prs_df %>%
mutate(treated = ifelse(treated == 1, labs[1], labs[2])) %>%
ggplot(aes(x = pr_score)) +
geom_histogram() +
facet_wrap(~treated) +
xlab("Probability of being in Treatment Condition") +
theme_bw()
但是,因为CPS样本中的计数&#39;群体越来越高,“治疗”中的数据越来越多。小组仍然几乎看不见。