一些人可能是一个简单的问题。
我有2个数据帧:
dput(head(Activitieslessthan35))
structure(list(`Main job: Working time in main job` = c(470,
440, 430, 430, 410, 150), Sleep = c(420, 450, 450, 420, 450,
460), `Unspecified TV video or DVD watching` = c(60, 40, 210,
190, 60, 0), Eating = c(80, 60, 40, 70, 60, 130), `Other personal care:Wash and dress` = c(60,
60, 50, 50, 70, 50), `Travel to work from home and back only` = c(60,
60, 50, 90, 90, 30), `Unspecified radio listening` = c(140, 180,
50, 90, 140, 160), `Other specified social life` = c(350, 270,
310, 330, 710, 440), `Socialising with family` = c(350, 270,
360, 330, 730, 540), `Food preparation and baking` = c(410, 310,
420, 380, 1000, 950)), row.names = c(NA, 6L), class = "data.frame")
和
dput(head(ActivitiesMoreOrEqual35))
structure(list(`Main job: Working time in main job` = c(360,
420, 390, 490, 540, 390), Sleep = c(590, 480, 310, 560, 280,
370), `Unspecified TV video or DVD watching` = c(100, 60, 130,
120, 60, 30), Eating = c(70, 100, 70, 40, 190, 80), `Other personal care:Wash and dress` = c(10,
30, 100, 60, 270, 90), `Travel to work from home and back only` = c(0,
50, 260, 50, 0, 0), `Unspecified radio listening` = c(50, 80,
260, 80, 210, 200), `Other specified social life` = c(190, 320,
790, 250, 580, 420), `Travel in the course of work` = c(50, 80,
260, 70, 120, 200), `Food preparation and baking` = c(440, 570,
820, 570, 820, 590)), row.names = c(NA, 6L), class = "data.frame")
我想将data.frames转换为因数-例如,将一个名为Activitieslessthan35
的因数变量与数据帧的列一起用作诸如“主要工作:主要工作的工作时间”之类的级别。 ','睡眠'等。稍后,我还要在并排的条形图中绘制(总和)因子的水平。
我不知道您是否愿意将data.frame转换为因子变量,以及如何更改data.frame的格式以创建图
欢迎任何建议
答案 0 :(得分:1)
如果我很好理解,您希望将两个数据框都以两列的长格式保存,一个列包含数据框的所有别名,第二列包含所有值,然后总结第一列的每个“因子”列,将两个数据框合并并将两个数据框绘制到一个图中。我说的对吗?
这是一种方法。我将df
称为数据帧Activitieslessthan35
和df2
数据框ActivitiesMoreOrEqual35
。
首先,我们将使用pivot_longer
library(tidyr)
library(dplyr)
df <- df %>% pivot_longer(everything(), names_to = "Activities", values_to = "Values_less_than35")
df2 <- df2 %>% pivot_longer(everything(),names_to = "Activities", values_to = "Values_More_than_35")
然后,我们将为您的每个数据框的每个因子计算总和:
df_sum = df%>% group_by(Activities) %>% summarise(Values_less_than35 = sum(Values_less_than35))
df2_sum = df2 %>% group_by(Activities) %>% summarise(Values_More_than_35 = sum(Values_More_than_35))
然后,我们通过使用“活动”作为合并列将两个数据框合并为一个单一的数据框。
final_df = merge(df_sum,df2_sum, by.x = "Activities", by.y = "Activities", all = TRUE)
最后,我们上一次final_df
进行转置是为了使值具有正确的形状,以便使用ggplot2
进行绘制
final_df <- final_df %>% pivot_longer(., -Activities, names_to = "Variable", values_to = "Value")
现在我们可以使用ggplot2
library(ggplot2)
ggplot(final_df, aes(x = stringr::str_wrap(Activities, 15), y = Value, fill = Variable)) +
geom_col(stat = "identity", position = position_dodge()) +
coord_flip()+
xlab("")
看起来像您所期望的吗?