使用R按日期分组后得到变量集的平均值

时间:2018-05-15 14:36:35

标签: r reshape lubridate melt

这是我的数据集:https://app.box.com/s/x5eux7mhdc0geyck4o47ttmpynah0wqk

快照:

enter image description here

我想创建一个数据框,其中情绪的平均值将在2个月的一组中出现。

我尝试了以下代码:

sentiment_dataset$created_at <- ymd_hms(sentiment_dataset$created_at)

sentiment_time <- sentiment_dataset %>% 
  group_by(created_at = cut(created_at, breaks="2 months")) %>%
          summarise(negative = mean(negative),
                    positive = mean(positive)) %>% melt

它出现以下错误:

Using created_at as id variables Error in match.names(clabs, names(xi)) : names do not match previous names

2 个答案:

答案 0 :(得分:1)

我不确定您是否可以在group_by语句中创建分组变量。但是,事先看起来像是使用mutate

library(dplyr)
library(tidyr)

sentiment_time <- sentiment_dataset %>%
  mutate(created_at = cut(created_at, breaks="2 months")) %>%
  group_by(created_at) %>%
  summarize(negative = mean(negative),
            positive = mean(positive)) %>%
  gather('sentiment', 'mean_value', negative, positive)

答案 1 :(得分:1)

我要结帐tibbletime package

library(tibbletime)
library(tidyverse)

sentiment_dataset %>%
  arrange(created_at) %>%
  as_tbl_time(index = created_at) %>%
  collapse_by("2 months", clean = TRUE) %>%
  group_by(created_at) %>%
  summarise(negative = mean(negative),
            positive = mean(positive))

# A time tibble: 48 x 3
# Index: created_at
   created_at          negative positive
   <dttm>                 <dbl>    <dbl>
 1 2010-09-01 00:00:00    0.143    1.43 
 2 2010-11-01 00:00:00    0.273    0.727
 3 2011-01-01 00:00:00    0.208    0.792
 4 2011-03-01 00:00:00    0.5      1.38 
 5 2011-05-01 00:00:00    0.25     0.75 
 6 2011-07-01 00:00:00    1        1    
 7 2011-09-01 00:00:00    0        1.5  
 8 2011-11-01 00:00:00    0.333    1    
 9 2012-01-01 00:00:00    0        0    
10 2012-03-01 00:00:00    0        1.6  
# ... with 38 more rows

当然,您可能希望在此之后管道gather()命令...例如:

sentiment_dataset %>%
  arrange(created_at) %>%
  as_tbl_time(index = created_at) %>%
  collapse_by("2 months", clean = TRUE) %>%
  group_by(created_at) %>%
  summarise(negative = mean(negative),
            positive = mean(positive)) %>%
  gather(sentiment, mean_sentiment, -created_at) %>%
  ggplot(., aes(created_at, mean_sentiment, color = sentiment)) +
  geom_point() +
  geom_line() +
  geom_smooth()

Line Plot