使用python清除推文json文件

时间:2017-11-04 15:20:26

标签: python json twitter

我有twitter json文件,想要从这里提取具体信息。可以在此处找到此json文件的示例https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json

{
    "created_at": "Thu Apr 06 15:24:15 +0000 2017",
    "id_str": "850006245121695744",
    "text": "1\/ Today we\u2019re sharing our vision for the future of the Twitter API platform!\nhttps:\/\/t.co\/XweGngmxlP",
    "user": {
      "id": 2244994945,
      "name": "Twitter Dev",
      "screen_name": "TwitterDev",
      "location": "Internet",
      "url": "https:\/\/dev.twitter.com\/",
      "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
    },
    "place": {

    },
    "entities": {
      "hashtags": [

      ],
      "urls": [
        {
          "url": "https:\/\/t.co\/XweGngmxlP",
          "unwound": {
            "url": "https:\/\/cards.twitter.com\/cards\/18ce53wgo4h\/3xo1c",
            "title": "Building the Future of the Twitter API Platform"
          }
        }
      ],
      "user_mentions": [

      ]
    }
  }

我尝试删除一些我不需要的项目,例如id_str。 所以我创建了一个列表,其中包含我需要的密钥名称,并迭代这个json文件(一个文件有超过一百万个推文)。我已经搜索了类似的问题,并尝试实施回复建议的内容。

tags = ["created_at", "text", "retweet_count",
        "friends_count","followers_count","verified","place"]
for line in json_file:
    try:
        data = json.loads(line)
        for i in data.keys():
            if i not in tags:
                try:
                    del data[i]
                except:
                    continue
    except:
        continue

for line in json_file:
    data = json.loads(line)
    print(data)

但是,我的json_file是空的,它不会在最后打印出任何东西。 而不是del data[i]我尝试了多种不同的方式,如

del data[str(i)]
data.pop(i)

提前致谢!

1 个答案:

答案 0 :(得分:-1)

您需要使用json.loads()一次读取json文件 所有标记都显示在data["tweets"]

下面一层
import json

json_file = open("test.json").read()
tags = ["created_at", "text", "retweet_count", "friends_count","followers_count","verified","place"]

data = json.loads(json_file)

for i in data["tweet"].keys():
    if i not in tags:
        del data["tweet"][i] 

print data