ggplot2创建时间频率

时间:2015-09-12 02:43:23

标签: r ggplot2 time-series frequency

我很难从我的数据创建ggplot2。我需要创建一个应该如下的情节: enter image description here

如果你可以给出一些建议,那对我的研究来说真的很好。感谢您提前花时间和精力。

非常小的数据集样本(df)如下所示:

tweet_created_at     hashtag_text
2015-05-08 00:07:58  ogretmenemayistamujdehazirandaatama
2015-05-08 00:07:58  onlarkonusurakpartiyapar
2015-05-08 00:10:48  ogretmenemayistamujdehazirandaatama
2015-05-08 00:10:48  onlarkonusurakpartiyapar
2015-05-08 02:50:03  onlarkonusurakpartiyapar
2015-05-08 00:10:56  ogretmenemayistamujdehazirandaatama
2015-05-08 00:10:56  onlarkonusurakpartiyapar
2015-05-08 02:53:13  onlarkonusurakpartiyapar
2015-05-08 02:53:13  pinokyokemal
2015-05-08 00:11:03  ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:03  onlarkonusurakpartiyapar
2015-05-08 00:11:06  ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:06  onlarkonusurakpartiyapar
2015-05-08 02:53:48  bingolunkararibuyumenindevami
2015-05-08 02:53:48  onlarkonusurakpartiyapar
2015-05-08 00:11:17  ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:17  onlarkonusurakpartiyapar
2015-05-08 00:16:21  ogretmenemayistamujdehazirandaatama
2015-05-08 00:16:21  onlarkonusurakpartiyapar

我使用过这个脚本,但我没想出创建频率部分:

ggplot(data=df,
       aes(x=as.POSIXct(tweet_created_at), y=hashtag_text,color=hashtag_text)) +
  geom_line()

我知道y轴的值不正确但我找不到合适的版本。它创造了这样的东西:

enter image description here

PS:我的数据集中有数百个主题标签,所以我需要选择前25个主题标签。

1 个答案:

答案 0 :(得分:1)

您可以使用geom_freqpoly。 如果您的tweet_created_at变量还不是POSIXct,请将其转换为:

df$tweet_created_at <-  as.POSIXct(df$tweet_created_at )

然后找到最常用的主题标签并创建一个选择变量:

#will look for top 2 now, easily expanded to 25
hashtag_table <- sort(table(df$hashtag_text),decreasing=T)
df$select <- as.character(df$hashtag_text) %in% names(hashtag_table)[1:2]

然后绘制:

p1 <- ggplot(df[df$select,], 
aes(x=tweet_created_at,group=hashtag_text, colour=hashtag_text)) +
  geom_freqpoly(binwidth=30*60) #as POSIXct, bindwidth in seconds. Now 30 min

结果(因为重叠数据而与facet一起) enter image description here