按日期绘制离散数据的计数

时间:2017-12-08 19:06:19

标签: r dataframe plot ggplot2

我是ggplot2的新手,并试图绘制一个连续的直方图,显示按日期和评级进行的评论的演变。

我的数据集如下所示:

        date rating reviews
1 2017-11-24      1 some text here
2 2017-11-24      1 some text here
3 2017-12-02      5 some text here
4 2017-11-24      3 some text here
5 2017-11-24      3 some text here
6 2017-11-24      4 some text here

我想得的是这样的:

代表rating == 1

        date    count
1  2017-11-24      2
2  2017-11-25      7
.
.
.

rating == 23

我试过

ggplot(aes(x = date, y = rating), data = df) + geom_line()

但是它只给我y轴的评分而不是计数:

enter image description here

2 个答案:

答案 0 :(得分:1)

您可以使用dplyr获取所需的数据集并将其传递到ggplot();

library(dplyr)
library(ggplot2)

 sample_data %>% group_by(rating,date) %>% summarise(n=n()) %>%
                ggplot(aes(x=date, y=n, group=rating, color=as.factor(rating))) +
                          geom_line(size=1.5) + geom_point()

enter image description here

<强> 数据:

sample_data <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 5L, 5L, 6L, 6L, 1L,           
     2L, 3L, 3L, 4L, 5L, 6L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), date = structure(c(1L, 
     1L, 3L, 7L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 8L, 8L, 3L, 4L, 5L, 5L,                 
     6L, 6L, 6L, 9L, 6L, 6L, 6L), .Label = c("2017-11-24", "2017-11-25",             
     "2017-11-26", "2017-11-27", "2017-11-28", "2017-11-29", "2017-12-02",           
     "2017-12-04", "2017-12-08"), class = "factor"), rating = c(1L,                  
     1L, 1L, 5L, 3L, 3L, 3L, 4L, 4L, 1L, 1L, 5L, 5L, 3L, 3L, 4L, 1L,                 
     1L, 1L, 1L, 5L, 3L, 3L, 4L), reviews = structure(c(1L, 1L, 1L,                  
     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,                 
     1L, 1L, 1L, 1L, 1L), .Label = "review", class = "factor")), .Names = c("id",    
     "date", "rating", "reviews"), row.names = c(NA, 24L), class = "data.frame")   

答案 1 :(得分:1)

只使用一些虚拟数据:

  library(tidyverse)
  set.seed(999)
  df <- data.frame(date = sample(seq(as.Date('2017/01/01'), as.Date('2017/04/01'), by="day"), 2000, replace = T),
             rating = sample(1:5,2000,replace = T))
  df$rating <- as.factor(df$rating)

  df %>%
  group_by(date,rating) %>%
  summarise(n = length(rating)) %>%
  ggplot(aes(date,n, color = rating)) +
  geom_line() +
  geom_point()