使用R中的ggplot2对不同组的密度图进行分层

时间:2015-03-05 15:33:05

标签: r ggplot2 histogram categories density-plot

我在R中有一个名为x的数据框,它有数百行。每行都是一个人。我有两个变量Height,它是连续的,Country,这是一个因素。我想绘制一个平滑的直方图,显示个人的所有高度。我希望按Country对它进行分层。我知道我可以使用以下代码执行此操作:

library(ggplot2)
ggplot(x, aes(x=Height, colour = (Country == "USA"))) + geom_density()

这将来自美国的每个人都描绘成一种颜色(真实),将来自任何其他国家的每个人都视为另一种颜色(假)。然而,我真正想做的是将来自美国的所有人用一种颜色和来自阿曼,尼日利亚和瑞士的每个人作为另一种颜色。我如何调整我的代码来做到这一点?

2 个答案:

答案 0 :(得分:3)

我编写了一些数据用于说明:

head(iris)
table(iris$Species)
df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue", 
               ifelse(df$Species == "virginica", "red", ""))

library(ggplot2)
p <- ggplot(df, aes(x = Sepal.Length, colour = (Species == "setosa")))
p + geom_density() # Your example

example with true and false

# Now let's choose the other created column
p <- ggplot(df, aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)

example with extra column 修改以删除您在绘图中不需要的“国家/地区”,只需将它们从您在绘图中使用的数据框中分组(请注意,带有颜色的标签不会完全匹配,但可以在数据框本身内更改):

p <- ggplot(df[df$Species2 %in% c("blue", "red"),], aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)

example with filtered data frame 要覆盖这些行,只需取出facet_wrap

p + geom_density() 

example without facet_wrap

答案 1 :(得分:0)

我很高兴能够完成上面的优秀答案。这是我的模组。

df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue", 
           ifelse(df$Species == "virginica", "red", ""))
homes2006 <- df

names(homes2006)[names(homes2006)=="Species"] <- "ownership"
homes2006a <- as.data.frame(sapply(homes2006, gsub, 
                               pattern ="setosa",                                         replacement = "renters"))
homes2006b <- as.data.frame(sapply(homes2006a, gsub,                                       pattern = "virginica", 
                        replacement = "home-owners"))
homes2006c <- as.data.frame(sapply(homes2006b, gsub,                                       pattern = "versicolor", 
                        replacement = "home-owners"))

##somehow sepal-length became a factor column
homes2006c[,1] <- as.numeric(homes2006c[,1])

library(ggplot2)

p <- ggplot(homes2006c, aes(x = Sepal.Length, 
           colour = (ownership == "home-owners")))

p + ylab("number of households") +
xlab("monthly income (NIS)") +
ggtitle("income distribution by home ownership") +
geom_density()

enter image description here