Question

我正在尝试使用R来分析Twitter数据，通过绘制一段时间内的推文数量，当我写作时

plot(tweet_df$created_at, tweet_df$text)

我收到此错误消息：

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf

以下是我使用的代码：

library("rjson")
json_file <- "tweet.json"
json_data <- fromJSON(file=json_file)
library("streamR")
tweet_df <- parseTweets(tweets=file)
#using the twitter data frame
tweet_df$created_at
tweet_df$text
plot(tweet_df$created_at, tweet_df$text)

Answer 1

你在这里遇到了几个问题，但没有什么是不可克服的。如果你想随着时间的推移跟踪推文，你真的要求每x时间框架创建的推文（每分钟推文，第二，无论如何）。这意味着您只需要created_at列，并且可以使用R hist函数构建图表。

如果你想用文字中提到的词语或其他词语进行分词，那也是可行的，但你应该使用ggplot2来做，也许可以提出另一个问题。无论如何，看起来parseTweets将twitters时间戳转换为字符字段，因此您希望将其转换为R可以理解的POSIXct时间戳字段。假设您的数据框看起来像这样：

❥ head(tweet_df[,c("id_str","created_at")])
              id_str                     created_at
1 597862782101561346 Mon May 11 20:36:09 +0000 2015
2 597862782097346560 Mon May 11 20:36:09 +0000 2015
3 597862782105694208 Mon May 11 20:36:09 +0000 2015
4 597862782105694210 Mon May 11 20:36:09 +0000 2015
5 597862782076198912 Mon May 11 20:36:09 +0000 2015
6 597862782114078720 Mon May 11 20:36:09 +0000 2015

你可以这样做：

❥ dated_tweets <- as.POSIXct(tweet_df$created_at, format = "%a %b %d %H:%M:%S +0000 %Y")

这将为您提供R＆＃39时间戳格式的过时推文。然后你可以像这样绘制它们。我打开了样本twitter feed 15分钟左右。这是结果：

❥ hist(dated_tweets, breaks ="secs", freq = TRUE)

enter image description here

使用R分析Twitter数据

1 个答案: