我目前正在尝试从通过Twitter流API获得的推文json文件中过滤地理位置推文。由于我必须分析一个大学项目的数据,因此我想在geojson文件中包含许多属性,以便为每个推文获取尽可能多的元数据。由于我的编码技能几乎不存在,因此我使用了发现的Python入门书籍中的以下代码。
import json
from argparse import ArgumentParser
def get_parser():
parser = ArgumentParser()
parser.add_argument('--tweets')
parser.add_argument('--geojson')
return parser
if __name__ == '__main__':
parser = get_parser()
args = parser.parse_args()
# Read Tweet collection and build geo data structure
with open(args.tweets, 'r') as f:
geo_data = {
"type": "FeatureCollection",
"features": []
}
for line in f:
tweet = json.loads(line)
try:
if tweet['coordinates']:
geo_json_feature = {
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": tweet['coordinates']['coordinates']
},
"properties": {
"text": tweet['text'],
"created_at": tweet['created_at']
}
}
geo_data['features'].append(geo_json_feature)
except KeyError:
# Skip if json doc is not a tweet (errors, etc.)
continue
# Save geo data
with open(args.geojson, 'w') as fout:
fout.write(json.dumps(geo_data, indent=4))
如您所见,此代码当前仅提取属性“ text”和“ created_at”。如上所述,我想添加更多要从我的数据中提取的属性,例如twitter用户名(“ screen_name”)或用户个人资料中指定的位置(“ location”)。但是,当尝试像下面的示例一样扩展代码时,创建的文件只是一个空的Feature Collection(仅包含“ text”和“ created_at”属性的代码才能正常工作。
"properties": {
"text": tweet['text'],
"created_at": tweet['created_at'],
"location": tweet['location']
}
我在这里想念什么?我认为也许属性的定位必须与我的原始jsonl文件中的完全相同,但是正如您在下面看到的那样,这与已经包含的两个属性不匹配。有人可以指出我的错误吗?
我的数据示例:
{"created_at":"Wed May 22 18:22:09 +0000 2019","id":1131263996806475776,"id_str":"1131263996806475776","text":"groenlinks #eu #vote #euelections2019 #greens #groenlinks #thenetherlands #europeseverkiezingen\u2026 https:\/\/t.co\/HCkdUhW8Sz","source":"\u003ca href=\"http:\/\/instagram.com\" rel=\"nofollow\"\u003eInstagram\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":36635431,"id_str":"36635431","name":"Inge Kersten \uf8ff","screen_name":"ingek73","location":"Nijmegen, the Netherlands","url":"http:\/\/www.facebook.com\/ingek73","description":"Nerdfighter from Groesbeek&living in Nijmegen, love books,tv,nature,films,music,London,Harry Potter,Disney,Sherlock,music,volunteering, museums etc!","translator_type":"none","protected":false,"verified":false,"followers_count":1909,"friends_count":2194,"listed_count":87,"favourites_count":329,"statuses_count":61302,"created_at":"Thu Apr 30 10:38:44 +0000 2009","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"0099B9","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme4\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme4\/bg.gif","profile_background_tile":true,"profile_link_color":"0099B9","profile_sidebar_border_color":"5ED4DC","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1071659525739085824\/AY8WQjrw_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1071659525739085824\/AY8WQjrw_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/36635431\/1526053269","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":{"type":"Point","coordinates":[51.8433,5.8611]},"coordinates":{"type":"Point","coordinates":[5.8611,51.8433]},"place":{"id":"ef0bc536202c76e3","url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/ef0bc536202c76e3.json","place_type":"city","name":"Nijmegen","full_name":"Nijmegen, Nederland","country_code":"NL","country":"The Netherlands","bounding_box":{"type":"Polygon","coordinates":[[[5.757652,51.790554],[5.757652,51.894741],[5.908260,51.894741],[5.908260,51.790554]]]},"attributes":{}},"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"groenlinks #eu #vote #euelections2019 #greens #groenlinks #thenetherlands #europeseverkiezingen #europeanelections2019 @ Nijmegen, Netherlands https:\/\/t.co\/VfVke65HtO","display_text_range":[0,166],"entities":{"hashtags":[{"text":"eu","indices":[11,14]},{"text":"vote","indices":[15,20]},{"text":"euelections2019","indices":[21,37]},{"text":"greens","indices":[38,45]},{"text":"groenlinks","indices":[46,57]},{"text":"thenetherlands","indices":[58,73]},{"text":"europeseverkiezingen","indices":[74,95]},{"text":"europeanelections2019","indices":[96,118]}],"urls":[{"url":"https:\/\/t.co\/VfVke65HtO","expanded_url":"https:\/\/www.instagram.com\/p\/BxxlJ1zi-Qa\/?igshid=48asdzarebcg","display_url":"instagram.com\/p\/BxxlJ1zi-Qa\/\u2026","indices":[143,166]}],"user_mentions":[],"symbols":[]}},"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"eu","indices":[11,14]},{"text":"vote","indices":[15,20]},{"text":"euelections2019","indices":[21,37]},{"text":"greens","indices":[38,45]},{"text":"groenlinks","indices":[46,57]},{"text":"thenetherlands","indices":[58,73]},{"text":"europeseverkiezingen","indices":[74,95]}],"urls":[{"url":"https:\/\/t.co\/HCkdUhW8Sz","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/1131263996806475776","display_url":"twitter.com\/i\/web\/status\/1\u2026","indices":[97,120]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"nl","timestamp_ms":"1558549329364"}