如何在推特JSON对象中访问位置和地理对象

时间:2019-02-18 01:04:38

标签: python json

我当前正在尝试从Twitter的API创建的json文件访问推文的地名和坐标。虽然并非我所有的推文都包含这些属性,但是有些和id喜欢收集它们。我目前的做法是:

for line in tweets_json:
    try:
        tweet = json.loads(line.strip()) # only messages contains 'text' field is a tweet
        tweet_id = (tweet['id']) # This is the tweet's id
        created_at = (tweet['created_at']) # when the tweet posted
        text = (tweet['text']) # content of the tweet

        user_id = (tweet['user']['id']) # id of the user who posted the tweet
        hashtags = []
        for hashtag in tweet['entities']['hashtags']:
            hashtags.append(hashtag['text'])

        lat = []
        long = []
        for coordinates in tweet['coordinates']['coordinates']:
            lat.append(coordinates[0])
            long.append(coordinates[1])

        country_code = []
        place_name = []
        for place in tweet['place']:
            country_code.append(place['country_code'])
            place_name.append(place['full_name'])

    except:
            # read in a line is not in JSON format (sometimes error occured)
        continue

截至目前,没有收集到Hashtags之外的属性,我是否尝试访问错误的属性?有关JSON对象的更多信息,请参见https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

1 个答案:

答案 0 :(得分:0)

通过将所有代码包装在Try / Except块中,可以避免发生的所有 错误,包括尝试访问不存在的“坐标”时的KeyErrors

>

如果某些已解析的tweet字典包含密钥,并且您想收集它们,则可以执行以下操作:

from json import JSONDecodeError

for line in tweets_json:
    # try to parse json
    try:
        tweet = json.loads(line.strip()) # only messages contains 'text' field is a tweet
    except JSONDecodeError:
        print('bad json')   
        continue     

    tweet_id = (tweet['id']) # This is the tweet's id
    created_at = (tweet['created_at']) # when the tweet posted
    text = (tweet['text']) # content of the tweet

    user_id = (tweet['user']['id']) # id of the user who posted the tweet
    hashtags = []
    for hashtag in tweet['entities']['hashtags']:
        hashtags.append(hashtag['text'])

    lat = []
    long = []

    # this is how you check for the presence of coordinates
    if 'coordinates' in tweet and 'coordinates' in tweet['coordinates']:
        for coordinates in tweet['coordinates']['coordinates']:
            lat.append(coordinates[0])
            long.append(coordinates[1])

    country_code = []
    place_name = []
    for place in tweet['place']:
        country_code.append(place['country_code'])
        place_name.append(place['full_name'])