我正在使用唐纳德·特朗普推文的公共数据集,可以在这里找到: https://www.kaggle.com/kingburrito666/better-donald-trump-tweets
这样做之后,我尝试按R中的日期对其进行分组。首先,我要计算pr的推文数。第二天,我想总结一下“收藏夹”和“转推”的内容。天。
我已经编写了以下代码,但是它始终会给我错误...您能帮我吗?
谢谢!
Donald <- read_csv(file="Donald-Tweets!.csv")
Donald
#Grouped
G_filter <- Donald %>%
select(Date,twt_favourites_IS_THIS_LIKE_QUESTION_MARK, Retweets) %>%
rename( Favourites = twt_favourites_IS_THIS_LIKE_QUESTION_MARK) %>%
group_by(as.Date.date(Date)) %>%
summarise(Total = sum(Favourites+Retweets), count(n))
View(G_filter)
答案 0 :(得分:1)
您可能正在寻找这里。
library(tidyverse)
G_filter <- Donald %>%
select(Date, twt_favourites_IS_THIS_LIKE_QUESTION_MARK, Retweets) %>%
rename(Favourites = twt_favourites_IS_THIS_LIKE_QUESTION_MARK) %>%
group_by(Date) %>%
mutate(Favorites_and_Retweets = Favourites + Retweets) %>%
summarise(Favorites_and_Retweets = sum(Favorites_and_Retweets),
Count = n())
G_filter
# # A tibble: 479 x 3
# Date Favorites_and_Retweets Count
# <chr> <int> <int>
# 1 15-07-16 66899 39
# 2 15-07-17 65212 22
# 3 15-07-18 97381 32
# 4 15-07-19 34229 12
# 5 15-07-20 62316 37
# 6 15-07-21 88132 62
# 7 15-07-22 69919 37
# 8 15-07-23 67963 43
# 9 15-07-24 67687 35
# 10 15-07-25 39744 25
# # ... with 469 more rows
无需转换Date
列。如果您确实想这样做,则一种方法是加载lubridate
程序包,然后在管道中执行mutate(Date = ymd(Date))
。