Question

我正在使用twitter API来获取有关输入搜索词的推文，并将它们存储到json文件中，如下面的代码所示。我只对推文文本感兴趣而没有别的。如何提取文本并忽略其他任何内容？最终目标是清理单个推文并对其进行情绪分析。谢谢！

    consumerKey = "xxx"
    consumerSecret = "xxx"
    accessToken = "xxx"
    accessTokenSecret = "xxx"

    auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
    auth.set_access_token(accessToken, accessTokenSecret)
    api = tweepy.API(auth)

    # Specify search term and count of tweets:
    searchTerm = input("Enter topic: ")
    limit = int(input("Enter the maximum number of tweets: "))

    tweets = tweepy.Cursor(api.search,q=searchTerm,count=limit, lang="en", tweet_mode='extended').items(limit)

    for tweet in tweets:

        # add to JSON                            
        with open('tweets.json', 'w', encoding='utf8') as file:
            json.dump(tweet._json, file)

Answer 1

通过阅读Twitter API文档中的一些内容，我可以看到JSON结构。

"tweet": {
"created_at": "Thu Apr 06 15:24:15 +0000 2017",
"id_str": "850006245121695744",
"text": "1\/ Today we\u2019re sharing our vision for the future of the Twitter API platform!\nhttps:\/\/t.co\/XweGngmxlP",
"user": {
  "id": 2244994945,
  "name": "Twitter Dev",
  "screen_name": "TwitterDev",
  "location": "Internet",

这只是JSON返回的一小部分，来自推文。

如您所知，JSON文本本质上是一本字典，因此您可以将其视为普通字典。

现在要在python中执行此操作，您需要知道要获取其值的键。如果我可以提出建议，那就是将请求模块用于您的项目，它更简单，更容易理解。

for tweet in tweets:

    # add to JSON                            
    with open('tweets.json', 'w', encoding='utf8') as file:
        json.dump(tweets[tweet]["text"], file)

您可以使用

而不是使用json.dump

for tweet in tweets:

    # add to JSON                            
    with open('tweets.json', 'w', encoding='utf8') as file:
        file.write(tweets[tweet]["text"])

通过这样做我们调用字典并给它一个键，在这种情况下是变量tweet，然后你可以看到字典推文里面有一个字典，这就是我们再次调用它的原因所以我们可以得到什么在这种情况下，我们希望键“text”的值

希望它有所帮助！

如何访问json文件中的tweet文本以执行进一步分析？

1 个答案: