这是我的数据集:https://app.box.com/s/x5eux7mhdc0geyck4o47ttmpynah0wqk
快照:
我想创建一个数据框,其中情绪的平均值将在2个月的一组中出现。
我尝试了以下代码:
sentiment_dataset$created_at <- ymd_hms(sentiment_dataset$created_at)
sentiment_time <- sentiment_dataset %>%
group_by(created_at = cut(created_at, breaks="2 months")) %>%
summarise(negative = mean(negative),
positive = mean(positive)) %>% melt
它出现以下错误:
Using created_at as id variables
Error in match.names(clabs, names(xi)) :
names do not match previous names
答案 0 :(得分:1)
我不确定您是否可以在group_by
语句中创建分组变量。但是,事先看起来像是使用mutate
。
library(dplyr)
library(tidyr)
sentiment_time <- sentiment_dataset %>%
mutate(created_at = cut(created_at, breaks="2 months")) %>%
group_by(created_at) %>%
summarize(negative = mean(negative),
positive = mean(positive)) %>%
gather('sentiment', 'mean_value', negative, positive)
答案 1 :(得分:1)
我要结帐tibbletime
package:
library(tibbletime)
library(tidyverse)
sentiment_dataset %>%
arrange(created_at) %>%
as_tbl_time(index = created_at) %>%
collapse_by("2 months", clean = TRUE) %>%
group_by(created_at) %>%
summarise(negative = mean(negative),
positive = mean(positive))
# A time tibble: 48 x 3
# Index: created_at
created_at negative positive
<dttm> <dbl> <dbl>
1 2010-09-01 00:00:00 0.143 1.43
2 2010-11-01 00:00:00 0.273 0.727
3 2011-01-01 00:00:00 0.208 0.792
4 2011-03-01 00:00:00 0.5 1.38
5 2011-05-01 00:00:00 0.25 0.75
6 2011-07-01 00:00:00 1 1
7 2011-09-01 00:00:00 0 1.5
8 2011-11-01 00:00:00 0.333 1
9 2012-01-01 00:00:00 0 0
10 2012-03-01 00:00:00 0 1.6
# ... with 38 more rows
当然,您可能希望在此之后管道gather()
命令...例如:
sentiment_dataset %>%
arrange(created_at) %>%
as_tbl_time(index = created_at) %>%
collapse_by("2 months", clean = TRUE) %>%
group_by(created_at) %>%
summarise(negative = mean(negative),
positive = mean(positive)) %>%
gather(sentiment, mean_sentiment, -created_at) %>%
ggplot(., aes(created_at, mean_sentiment, color = sentiment)) +
geom_point() +
geom_line() +
geom_smooth()