计算因子变量中每个类别的比率,按另一个因子中的类别计算

时间:2018-11-19 01:09:30

标签: r ggplot2 group-by

这里有两列,它们都是因子变量。第一个是囚犯的种族,第二个是他们是否再犯。我想按种族划分累犯率。我应该如何实现?

我已经尝试过了:

df %>%
  group_by(race, Recidivated) %>%
  summarize(count = n()) %>%
  arrange (-count) %>%
  ggplot(aes(reorder(race, count, FUN = max),
             count, fill = race)) + 
  geom_col() +
  coord_flip() +
  scale_fill_manual(values=palette_9_colors) +
  theme(legend.position = "none") +
  labs(x = "Charge", y = "Count",
       title="Recidivism by Rates",
       subtitle= "Broward County - Source: Propublica",
       caption="UrbanSpatialAnalysis.com") +
  plotTheme()   

结果是一个直方图,计算每个种族的数目。如何获得一个图表,以种族方式直观显示累犯率?谢谢!!!

这里有一些数据!

    > head(df)
   sex age         age_cat             race priors_count two_year_recid
1 Male  69 Greater than 45            Other            0              0
2 Male  34         25 - 45 African-American            0              1
3 Male  24    Less than 25 African-American            4              1
4 Male  44         25 - 45            Other            0              0
5 Male  41         25 - 45        Caucasian           14              1
6 Male  43         25 - 45            Other            3              0
                   r_charge_desc                  c_charge_desc
1                                  Aggravated Assault w/Firearm
2    Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3    Driving Under The Influence          Possession of Cocaine
4                                                       Battery
5 Poss of Firearm by Convic Felo      Possession Burglary Tools
6                                         arrest case no charge
  c_charge_degree r_charge_degree juv_other_count length_of_stay
1               F                               0              1
2               F            (F3)               0             10
3               F            (M1)               1              1
4               M                               0              1
5               F            (F2)               0              6
6               F                               0              1
    Recidivated
1 notRecidivate
2    Recidivate
3    Recidivate
4 notRecidivate
5    Recidivate
6 notRecidivate

1 个答案:

答案 0 :(得分:0)

library(ggplot2)

ggplot(data = ideaths, aes(x = age_group, y = deaths, fill = fyear)) +
  geom_col(position = position_dodge(width = 0.9)) +
  geom_text(aes(x = age_group, y = deaths + 3, label = deaths), 
            position = position_dodge(width = 0.9)) +
  ggtitle("Figure 8.") +
  scale_fill_manual(values = c("#7F7F7F", "#94D451")) +
  scale_y_continuous(breaks = seq(0, 55, 5)) + 
  theme_light() +
  theme(
    panel.border = element_blank(), 
    panel.grid.major.x = element_blank(), 
    panel.grid.minor.y = element_blank(), 
    panel.grid.major.y = element_line(size = .1, color = "grey"), 
    axis.title = element_blank(), legend.position = "bottom", 
    legend.title = element_blank(), plot.title = element_text(size = 10)
  )

如果Recidived是逻辑变量,则应该对Recidived使用TRUE或FALSE;对于逻辑而言,mean()是TRUE的比例。

希望这会有所帮助:)