我目前有一些基本上是因素和日期的数据。这是一个简化的概念。
date <- c(1901,1901,1901,1902,1902,1902,1901,1903,1902,1904,1902,1903,1903,1904,1905, 1901,1903,1902,1904,1902,1902,1903,1904,1902,1902,1901,1903,1903,1904,1905, 1905,1906,1907,1908,1901,1908,1907,1905,1906,1902,1903,1903,1903,1904,1905,1901,1901,1901,1902,1902,1902,1901,1903,1902,1904,1902,1903,1903,1904,1905,
1901,1903,1902,1904,1902,1902,1903,1904,1902,1902,1901,1903,1903,1904,1905,
1905,1906,1907,1908,1901,1908,1907,1905,1906,1902,1903,1903,1903,1904,1905,
1905,1906,1907,1908,1901,1908,1907,1920,1920,1920,1921,1921,1921,1921,1921)
genre <- sample(c("fiction","nonfiction"),105,replace=TRUE)
data <- as.data.frame(cbind(date,genre))
# I know this is not an ideal way to coerce to a numeric
data$date <- as.numeric(as.character(data$date))
到目前为止,这么好。然而,正如你所注意到的那样,如果你将它绘制出来,那么这条线模糊的数据就会有很大的差距。这个情节将说明。
library(ggplot2)
ggplot(data,aes(x=date,color=genre)) + geom_line(stat='count')
我看到this post建议添加一个组,我可以这样做。
data$group <- ifelse(data$date < 1910,1,2)
ggplot(data,aes(x=date,color=genre,group=group)) + geom_line(stat='count')
所以似乎没有办法保留我想要输出的颜色美学和使用group
指定stat='count'
, 。例如,这个图很好地显示了数据中的差距,但是基于genre
因素丢失了颜色/区别:
ggplot(data,aes(x=date,color=genre,group=group)) + geom_line(stat='count')
那么,这不可能吗?我错过了什么吗?有没有更好的方法来做到这一点,或者我需要summarize
或以其他方式改变我的约会,以便我在绘图阶段不依赖stat='count'
?