使用GGPLOT2显示因子变量之间的关系

时间:2019-02-28 04:31:43

标签: r ggplot2

我正在尝试使用ggplot()查看幸福感与众多其他变量之间的关系,例如AGE或SEX或MARITAL STATUS。我有这个数据集

https://xdaiisu.github.io/ds202materials/hwlabs/HAPPY.rds

library(ggplot2)

HAPPY[HAPPY == "IAP"] <- NA
HAPPY[HAPPY == "DK"] <- NA
HAPPY[HAPPY == "NA"] <- NA

我下载了此数据集,并使用此代码将一些变量转换为“因素”,我将仅以MARITAL和HAPPY为例;

HAPPY <- HAPPY %>% mutate(MARITAL = factor(MARITAL, 
                                           levels = c("NEVER MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "WIDOWED"))) 
               %>% arrange(desc(MARITAL))

HAPPY <- HAPPY %>% mutate(HAPPY= factor(HAPPY, 
                                        levels = c("NOT TOO HAPPY", "PRETTY HAPPY", "VERY HAPPY"))) 
               %>% arrange(desc(HAPPY))

现在,我想使用ggplot2图来显示MARITAL和幸福之间的关系(由HAPPY列表示)。我是ggplot2的新手,所以我只是在尝试使用它的方法。另外,如果您不想做HAPPY VS MARITAL,那么您也可以使用任何变量或列与HAPPY进行比较,就像我希望不断得到错误一样。

谢谢!

2 个答案:

答案 0 :(得分:1)

起点可能只是为了可视化观察值,例如:ggplot(HAPPY, aes(x = HAPPY, y = MARITAL)) + geom_count()

您也可以尝试geom_bin2d:https://ggplot2.tidyverse.org/reference/geom_bin2d.html

答案 1 :(得分:0)

以下代码将帮助您入门。

#Loading Libraries
library(ggplot2)
library(dplyr)
library(ggthemes)
#reading data
df <- readRDS("HAPPY.rds")
df<- na.omit(df) #deleting NA's

#converting class of categorical columns from  character to factors 
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)],as.factor)
df$AGE<- as.numeric(df$AGE)
#Grouping through dplyr and plotting through ggplot2
df %>% 
  group_by(HAPPY,SEX) %>%
  summarise(mean_age=mean(AGE))%>%
  ggplot(aes(x=HAPPY,y=mean_age,fill=SEX))+
  geom_bar( stat="identity",position = position_dodge())+
  labs(x="Happiness", y="Average Age")+
  theme_gdocs()+
  geom_text(aes(label=paste(round(mean_age,0)) ), vjust=0,position = position_dodge(0.9))+
  scale_fill_manual( values=c( "deeppink","mediumturquoise"))

output plot