如何从data.frame创建因子变量并在并排图中绘制列

时间:2019-12-01 19:15:49

标签: r dataframe ggplot2

一些人可能是一个简单的问题。

我有2个数据帧:

dput(head(Activitieslessthan35))

structure(list(`Main job: Working time in main job` = c(470, 
440, 430, 430, 410, 150), Sleep = c(420, 450, 450, 420, 450, 
460), `Unspecified TV video or DVD watching` = c(60, 40, 210, 
190, 60, 0), Eating = c(80, 60, 40, 70, 60, 130), `Other personal care:Wash and dress` = c(60, 
60, 50, 50, 70, 50), `Travel to work from home and back only` = c(60, 
60, 50, 90, 90, 30), `Unspecified radio listening` = c(140, 180, 
50, 90, 140, 160), `Other specified social life` = c(350, 270, 
310, 330, 710, 440), `Socialising with family` = c(350, 270, 
360, 330, 730, 540), `Food preparation and baking` = c(410, 310, 
420, 380, 1000, 950)), row.names = c(NA, 6L), class = "data.frame")

   dput(head(ActivitiesMoreOrEqual35))

structure(list(`Main job: Working time in main job` = c(360, 
420, 390, 490, 540, 390), Sleep = c(590, 480, 310, 560, 280, 
370), `Unspecified TV video or DVD watching` = c(100, 60, 130, 
120, 60, 30), Eating = c(70, 100, 70, 40, 190, 80), `Other personal care:Wash and dress` = c(10, 
30, 100, 60, 270, 90), `Travel to work from home and back only` = c(0, 
50, 260, 50, 0, 0), `Unspecified radio listening` = c(50, 80, 
260, 80, 210, 200), `Other specified social life` = c(190, 320, 
790, 250, 580, 420), `Travel in the course of work` = c(50, 80, 
260, 70, 120, 200), `Food preparation and baking` = c(440, 570, 
820, 570, 820, 590)), row.names = c(NA, 6L), class = "data.frame")

我想将data.frames转换为因数-例如,将一个名为Activitieslessthan35的因数变量与数据帧的列一起用作诸如“主要工作:主要工作的工作时间”之类的级别。 ','睡眠'等。稍后,我还要在并排的条形图中绘制(总和)因子的水平。

我不知道您是否愿意将data.frame转换为因子变量,以及如何更改data.frame的格式以创建图

欢迎任何建议

1 个答案:

答案 0 :(得分:1)

如果我很好理解,您希望将两个数据框都以两列的长格式保存,一个列包含数据框的所有别名,第二列包含所有值,然后总结第一列的每个“因子”列,将两个数据框合并并将两个数据框绘制到一个图中。我说的对吗?

这是一种方法。我将df称为数据帧Activitieslessthan35df2 数据框ActivitiesMoreOrEqual35

首先,我们将使用pivot_longer

将每个数据帧转置为长格式
library(tidyr)
library(dplyr)
df <- df %>% pivot_longer(everything(), names_to = "Activities", values_to = "Values_less_than35")
df2 <- df2 %>% pivot_longer(everything(),names_to = "Activities", values_to = "Values_More_than_35")

然后,我们将为您的每个数据框的每个因子计算总和:

df_sum = df%>% group_by(Activities) %>% summarise(Values_less_than35 = sum(Values_less_than35))
df2_sum = df2 %>% group_by(Activities) %>% summarise(Values_More_than_35 = sum(Values_More_than_35))

然后,我们通过使用“活动”作为合并列将两个数据框合并为一个单一的数据框。

final_df = merge(df_sum,df2_sum, by.x = "Activities", by.y = "Activities", all = TRUE)

最后,我们上一次final_df进行转置是为了使值具有正确的形状,以便使用ggplot2进行绘制

final_df <- final_df %>% pivot_longer(., -Activities, names_to = "Variable", values_to = "Value")

现在我们可以使用ggplot2

绘制最终数据帧
library(ggplot2)
ggplot(final_df, aes(x = stringr::str_wrap(Activities, 15), y = Value, fill = Variable)) +
  geom_col(stat = "identity", position = position_dodge()) +
  coord_flip()+
  xlab("")

然后您得到以下情节: enter image description here

看起来像您所期望的吗?