我有以下data.frame
,其中包含3个分类变量(不同类型的血管病变)和1个连续变量(输出)。我有兴趣看到输出与不同类型的血管病变之间的关系,即与轻度/重度病理学相关的更高/更低的输出?
> dput(df)
structure(list(Vascular_Pathology_M = structure(c(1L, 2L, 3L,
1L, 1L, 2L, 4L, 3L, 1L, 2L), .Label = c("Absent", "Mild", "Mild/Moderate",
"Moderate/Severe", "Severe"), class = "factor"), Vascular_Pathology_F = structure(c(4L,
2L, 1L, 1L, 1L, 1L, 2L, 4L, 1L, 1L), .Label = c("Absent", "Mild",
"Mild/Moderate", "Moderate/Severe", "Severe"), class = "factor"),
Vascular_Pathology_O = structure(c(1L, 3L, 4L, 3L, 1L, 2L,
1L, 1L, 1L, 2L), .Label = c("Absent", "Mild", "Mild/Moderate",
"Moderate/Severe"), class = "factor"), Output = c(1.01789418758932,
1.05627630598801, 1.49233946102323, 1.38192374975672, 1.13097652937671,
0.861306979571144, 0.707820561413699, 1.16628243128399, 0.983163398006992,
1.23972603843843)), .Names = c("Vascular_Pathology_M", "Vascular_Pathology_F",
"Vascular_Pathology_O", "Output"), row.names = c(1L, 3L, 4L,
5L, 6L, 7L, 8L, 10L, 11L, 12L), class = "data.frame")
> df
Vascular_Pathology_M Vascular_Pathology_F Vascular_Pathology_O Output
1 Absent Moderate/Severe Absent 1.0178942
3 Mild Mild Mild/Moderate 1.0562763
4 Mild/Moderate Absent Moderate/Severe 1.4923395
5 Absent Absent Mild/Moderate 1.3819237
6 Absent Absent Absent 1.1309765
7 Mild Absent Mild 0.8613070
8 Moderate/Severe Mild Absent 0.7078206
10 Mild/Moderate Moderate/Severe Absent 1.1662824
11 Absent Absent Absent 0.9831634
12 Mild Absent Mild 1.2397260
答案 0 :(得分:1)
您可以查看各种病症的相互作用。例如,使用条形图
## Make the interaction variable
df$interact <- interaction(df[, 1:3], sep="_")
## Look at means of groups
library(dplyr)
df %>% group_by(interact) %>%
dplyr::summarise(Output = mean(Output)) -> means
ggplot(means, aes(interact, Output))+
geom_bar(stat="identity") +
theme(axis.text=element_text(angle=90)) +
xlab("Interaction")
或点
ggplot(df, aes(interact, Output))+
geom_point() +
theme(axis.text=element_text(angle=45, hjust=1)) +
xlab("Interaction") +
geom_point(data=means, col="red") +
ylim(0, 1.6)
答案 1 :(得分:0)
您可以根据分类变量
简单地绘制输出plot(df[, 1], df[, 4])
plot(df[, 2], df[, 4])
plot(df[, 3], df[, 4])
答案 2 :(得分:0)
您有一个4维数据集。一个选项是在一个小的多个系列中进行散点图(x / y =两个维度)(还有一个维度),并将Output变量映射到视觉上像大小一样的东西(那里是第四个)尺寸)。
示例,将数据放入名为data.frame
的{{1}}后(因为my_dat
已分配给R中的某个函数)。抖动点以显示每个点的多个观察值,并按Y位置着色以帮助明确哪个点与哪个类别相关。
df