我有twitter json文件,想要从这里提取具体信息。可以在此处找到此json文件的示例https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json
{
"created_at": "Thu Apr 06 15:24:15 +0000 2017",
"id_str": "850006245121695744",
"text": "1\/ Today we\u2019re sharing our vision for the future of the Twitter API platform!\nhttps:\/\/t.co\/XweGngmxlP",
"user": {
"id": 2244994945,
"name": "Twitter Dev",
"screen_name": "TwitterDev",
"location": "Internet",
"url": "https:\/\/dev.twitter.com\/",
"description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
},
"place": {
},
"entities": {
"hashtags": [
],
"urls": [
{
"url": "https:\/\/t.co\/XweGngmxlP",
"unwound": {
"url": "https:\/\/cards.twitter.com\/cards\/18ce53wgo4h\/3xo1c",
"title": "Building the Future of the Twitter API Platform"
}
}
],
"user_mentions": [
]
}
}
我尝试删除一些我不需要的项目,例如id_str
。
所以我创建了一个列表,其中包含我需要的密钥名称,并迭代这个json文件(一个文件有超过一百万个推文)。我已经搜索了类似的问题,并尝试实施回复建议的内容。
tags = ["created_at", "text", "retweet_count",
"friends_count","followers_count","verified","place"]
for line in json_file:
try:
data = json.loads(line)
for i in data.keys():
if i not in tags:
try:
del data[i]
except:
continue
except:
continue
for line in json_file:
data = json.loads(line)
print(data)
但是,我的json_file是空的,它不会在最后打印出任何东西。
而不是del data[i]
我尝试了多种不同的方式,如
del data[str(i)]
data.pop(i)
提前致谢!
答案 0 :(得分:-1)
您需要使用json.loads()
一次读取json文件
所有标记都显示在data["tweets"]
import json
json_file = open("test.json").read()
tags = ["created_at", "text", "retweet_count", "friends_count","followers_count","verified","place"]
data = json.loads(json_file)
for i in data["tweet"].keys():
if i not in tags:
del data["tweet"][i]
print data