打印' hashtags'使用Python从JSON到CSV文件

时间:2017-05-03 11:46:07

标签: python json csv twitter hashtag

我是初学者,使用以下代码:

f = open("singletweetwithtimezone.json", "r")
tweet_text = f.read()

import json

tweet_json = json.loads(tweet_text)

g = open("Singletweetcsvoutput.csv", "w")

g.write(tweet_json["created_at"]+"\t")
g.write(tweet_json["user"]["time_zone"]+"\t")
g.write(tweet_json["entities"]["hashtags"]["text"])

g.close()
f.close()

除了主题标签外,写作也有效。我希望它能够写出文字' messi'在CSV文件中,但由于缺乏知识,我无法弄清楚我做错了什么。我收到以下错误:

g.write(tweet_json["entities"]["hashtags"]["text"])
TypeError: list indices must be integers, not str". " 

JSON树显示在我添加的图片中: what the tree looks like

任何可以帮助我的人?​​

RAW JSON代码:

{"created_at":"Sun Apr 23 21:04:13 +0000 2017","id":856252394233106432,"id_str":"856252394233106432","text":"RT @11FC_FR: Et \u00e0 la fin ... #Messi \n\ud83d\ude0d https:\/\/t.co\/uiyTnJJiKd","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1628829458,"id_str":"1628829458","name":"LK20","screen_name":"KeevinGruny","location":null,"url":null,"description":"\u26bd\ufe0f\u26bd\ufe0f","protected":false,"verified":false,"followers_count":903,"friends_count":209,"listed_count":39,"favourites_count":7328,"statuses_count":59480,"created_at":"Sun Jul 28 21:50:55 +0000 2013","utc_offset":10800,"time_zone":"Athens","geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/854838667428466688\/jE52U_LU_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/854838667428466688\/jE52U_LU_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1628829458\/1492857072","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Sun Apr 23 20:38:37 +0000 2017","id":856245950544838660,"id_str":"856245950544838660","text":"Et \u00e0 la fin ... #Messi \n\ud83d\ude0d https:\/\/t.co\/uiyTnJJiKd","display_text_range":[0,25],"source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2617563403,"id_str":"2617563403","name":"11FootballClub","screen_name":"11FC_FR","location":"3, All\u00e9e Cassard - NANTES","url":"http:\/\/www.11footballclub.com","description":"11FootballClub est un concept store unique et une boutique en ligne soign\u00e9e. Actus foot, nouveaut\u00e9s produits, promos et jeux concours","protected":false,"verified":false,"followers_count":49693,"friends_count":24561,"listed_count":54,"favourites_count":350,"statuses_count":1268,"created_at":"Fri Jul 11 15:21:20 +0000 2014","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"A16E1E","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/648580532322836480\/2lodFucd_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/648580532322836480\/2lodFucd_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2617563403\/1443468740","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":155,"favorite_count":92,"entities":{"hashtags":[{"text":"Messi","indices":[16,22]}],"urls":[],"user_mentions":[],"symbols":[],"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[26,49],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[26,49],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"fr"},"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Messi","indices":[29,35]}],"urls":[],"user_mentions":[{"screen_name":"11FC_FR","name":"11FootballClub","id":2617563403,"id_str":"2617563403","indices":[3,11]}],"symbols":[],"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[39,62],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}},"source_status_id":856245950544838660,"source_status_id_str":"856245950544838660","source_user_id":2617563403,"source_user_id_str":"2617563403"}]},"extended_entities":{"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[39,62],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}},"source_status_id":856245950544838660,"source_status_id_str":"856245950544838660","source_user_id":2617563403,"source_user_id_str":"2617563403"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"fr","timestamp_ms":"1492981453842"}

1 个答案:

答案 0 :(得分:0)

首先打印响应中每个不同节点的类型。您看到的错误是由于您尝试访问响应中的所有内容,因为它是字典的键。事实上,截图中的主题标签可能是一个数组,因此需要按以下方式访问:

tweet_json['entities']['hashtags'][0]['text']

Hashtags包含一个数组,在这种情况下,数组的长度为1,因此您可以使用[0]访问,但由于这些数组的长度是可变的,您应该添加一个长度检查,然后执行循环a有点像下面。我喜欢使用csv library中的dictwriter,即使这是一个过度杀戮,它可以用来浏览几条推文。

import csv
import json


with open('.../input.json','r') as inputfile:
    tweet= inputfile.read()

tweet_json = json.loads(tweet)

with open('.../output.csv', 'w') as csvfile:
    fieldnames = ['created_at', 'user', 'hashtags']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    #extract all info that you want to write
    created_at = tweet_json['created_at']
    #selecting the screen_name of the user rather than id
    user = tweet_json['user']['screen_name']
    hashtags = tweet_json['entities']['hashtags']
    #creating an empty string for the hashtags in the array
    hashes = list()
    for hashtag in hashtags:
        text = hashtag['text']
        #append to hashes listed_count
        hashes.append(text)
    #stringify the list and write to file (will be ugly)
    writer.writerow({"created_at":created_at, "user":user,"hashtags":str(hashes) })