在Geojson文件中向要素集合添加属性

时间:2019-07-03 14:13:35

标签: python json twitter geojson

我目前正在尝试从通过Twitter流API获得的推文json文件中过滤地理位置推文。由于我必须分析一个大学项目的数据,因此我想在geojson文件中包含许多属性,以便为每个推文获取尽可能多的元数据。由于我的编码技能几乎不存在,因此我使用了发现的Python入门书籍中的以下代码。

import json
from argparse import ArgumentParser

def get_parser():
    parser = ArgumentParser()
    parser.add_argument('--tweets')
    parser.add_argument('--geojson')
    return parser

if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    # Read Tweet collection and build geo data structure
    with open(args.tweets, 'r') as f:
        geo_data = {
            "type": "FeatureCollection",
            "features": []
        }
        for line in f:
            tweet = json.loads(line)
            try:
                if tweet['coordinates']:
                    geo_json_feature = {
                        "type": "Feature",
                        "geometry": {
                            "type": "Point",
                            "coordinates": tweet['coordinates']['coordinates']
                        },
                        "properties": {
                            "text": tweet['text'],
                            "created_at": tweet['created_at']
                        }
                    }
                    geo_data['features'].append(geo_json_feature)
            except KeyError:
                # Skip if json doc is not a tweet (errors, etc.)
                continue
        # Save geo data
       with open(args.geojson, 'w') as fout:
            fout.write(json.dumps(geo_data, indent=4))

如您所见,此代码当前仅提取属性“ text”和“ created_at”。如上所述,我想添加更多要从我的数据中提取的属性,例如twitter用户名(“ screen_name”)或用户个人资料中指定的位置(“ location”)。但是,当尝试像下面的示例一样扩展代码时,创建的文件只是一个空的Feature Collection(仅包含“ text”和“ created_at”属性的代码才能正常工作。

                        "properties": {
                            "text": tweet['text'],
                            "created_at": tweet['created_at'],
                            "location": tweet['location']
                        }

我在这里想念什么?我认为也许属性的定位必须与我的原始jsonl文件中的完全相同,但是正如您在下面看到的那样,这与已经包含的两个属性不匹配。有人可以指出我的错误吗?

我的数据示例:

{"created_at":"Wed May 22 18:22:09 +0000 2019","id":1131263996806475776,"id_str":"1131263996806475776","text":"groenlinks #eu #vote #euelections2019 #greens #groenlinks #thenetherlands #europeseverkiezingen\u2026 https:\/\/t.co\/HCkdUhW8Sz","source":"\u003ca href=\"http:\/\/instagram.com\" rel=\"nofollow\"\u003eInstagram\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":36635431,"id_str":"36635431","name":"Inge Kersten \uf8ff","screen_name":"ingek73","location":"Nijmegen, the Netherlands","url":"http:\/\/www.facebook.com\/ingek73","description":"Nerdfighter from Groesbeek&living in Nijmegen, love books,tv,nature,films,music,London,Harry Potter,Disney,Sherlock,music,volunteering, museums etc!","translator_type":"none","protected":false,"verified":false,"followers_count":1909,"friends_count":2194,"listed_count":87,"favourites_count":329,"statuses_count":61302,"created_at":"Thu Apr 30 10:38:44 +0000 2009","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"0099B9","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme4\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme4\/bg.gif","profile_background_tile":true,"profile_link_color":"0099B9","profile_sidebar_border_color":"5ED4DC","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1071659525739085824\/AY8WQjrw_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1071659525739085824\/AY8WQjrw_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/36635431\/1526053269","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":{"type":"Point","coordinates":[51.8433,5.8611]},"coordinates":{"type":"Point","coordinates":[5.8611,51.8433]},"place":{"id":"ef0bc536202c76e3","url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/ef0bc536202c76e3.json","place_type":"city","name":"Nijmegen","full_name":"Nijmegen, Nederland","country_code":"NL","country":"The Netherlands","bounding_box":{"type":"Polygon","coordinates":[[[5.757652,51.790554],[5.757652,51.894741],[5.908260,51.894741],[5.908260,51.790554]]]},"attributes":{}},"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"groenlinks #eu #vote #euelections2019 #greens #groenlinks #thenetherlands #europeseverkiezingen #europeanelections2019 @ Nijmegen, Netherlands https:\/\/t.co\/VfVke65HtO","display_text_range":[0,166],"entities":{"hashtags":[{"text":"eu","indices":[11,14]},{"text":"vote","indices":[15,20]},{"text":"euelections2019","indices":[21,37]},{"text":"greens","indices":[38,45]},{"text":"groenlinks","indices":[46,57]},{"text":"thenetherlands","indices":[58,73]},{"text":"europeseverkiezingen","indices":[74,95]},{"text":"europeanelections2019","indices":[96,118]}],"urls":[{"url":"https:\/\/t.co\/VfVke65HtO","expanded_url":"https:\/\/www.instagram.com\/p\/BxxlJ1zi-Qa\/?igshid=48asdzarebcg","display_url":"instagram.com\/p\/BxxlJ1zi-Qa\/\u2026","indices":[143,166]}],"user_mentions":[],"symbols":[]}},"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"eu","indices":[11,14]},{"text":"vote","indices":[15,20]},{"text":"euelections2019","indices":[21,37]},{"text":"greens","indices":[38,45]},{"text":"groenlinks","indices":[46,57]},{"text":"thenetherlands","indices":[58,73]},{"text":"europeseverkiezingen","indices":[74,95]}],"urls":[{"url":"https:\/\/t.co\/HCkdUhW8Sz","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/1131263996806475776","display_url":"twitter.com\/i\/web\/status\/1\u2026","indices":[97,120]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"nl","timestamp_ms":"1558549329364"}

0 个答案:

没有答案