Question

我正在努力有效地绘制一系列双变量条形图。每个图应显示按性别分布的一系列人口统计变量的案例频率。在创建整理变量variable时，此代码可以很好地运行但，它的级别是不同人口统计变量的所有级别。由于这是一个新因素，R按照自己的字母顺序排列因子水平。但是，正如您可以从下面的“变量”的因子水平和结果图中看到的那样，它们没有有意义的顺序。即收入类别与教育水平一样无序。

在我的真实数据集中，还有更多的因子级别，因此可以简单地重新定位variable但不可行。我想到的一个选项是不将melt变量放入variable，而是尝试做某些版本的summarise_each()。但我无法让它发挥作用。

感谢您的帮助。

#Age variable
age<-sample(c('18 to 24', '25 to 45', '45+'), size=100, replace=T)
#gender variable
gender<-sample(c('M', 'F'), size=100, replace=T)
#income variable
income<-sample(c(10,20,30,40,50,60,70,80,100,110), size=100, replace=T)
#education variable
education<-sample(c('High School', 'College', 'Elementary'), size=100, replace=T)
#tie together in df
df<-data.frame(age, gender, income, education)
#begin tidying
df %>% 
#tidy, not gender
gather(variable, value, -c(gender))%>%
#group by value, variable, then gender
group_by(value, variable, gender)  %>%
#summarise to obtain table cell frequencies
summarise(freq=n())%>%
#begin plotting, value (categories) as x-axis, frequency as y, gender as grouping variable, original variable as the facetting
ggplot(aes(x=value, y=freq, group=gender))+geom_bar(aes(fill=gender),  stat='identity', position='dodge')+facet_wrap(~variable, scales='free_x')

Answer 1

数据

df$education <- factor(df$education, c("Elementary", "High School", "College")) ddf <- df %>% gather(variable, value, -gender) %>% group_by(value, variable, gender) %>% summarise(freq = n())

<强>代码

lvl <- unlist(lapply(df[, -2], function(.) levels(as.factor(.)))) ddf$value <- factor(ddf$value, lvl) ddf %>% ggplot(aes(x = value, y = freq, group = gender)) + geom_bar(aes(fill = gender), stat = 'identity', position = 'dodge') + facet_wrap(~variable, scales='free_x')

<强>解释

gather将education，income和age中的值转换为字符向量。 ggplot然后使用这些值的规范顺序（即按字母顺序排列）。如果您希望它们具有特定的排序，您应该首先将列转换为因子，然后按照您喜欢的顺序分配级别（如您所述）。我只是采用了原始级别的排序（并将数值income静默地转换为一个因子 - 可能需要对您的代码进行一些调整）。但它表明，您不必自己硬编码任何级别，假设级别在原始数据集中的顺序正确。

所以在你的实际案例中，你应该做的是：

将字母矢量value转换为因子

按照您希望的顺序分配级别，以便在ggplot
中显示
<强>剧情

在整理或融化后轻松重新排序因子水平

1 个答案: