如何在某人的时间线上刮取一条推文的favorite_count?

时间:2016-01-16 12:50:59

标签: python twitter scrape

如果我在没有最后一行的情况下运行我的代码:getVal(tweet['retweeted_status']['favorite_count']),那么scrape会起作用,但是当我添加此行时,我收到一条错误消息KeyError: 'retweeted_status'

有谁知道我做错了什么?

q = "David_Cameron"
results = twitter_user_timeline(twitter_api, q)
print len(results)
# Show one sample search result by slicing the list...
# print json.dumps(results[0], indent=1)
csvfile = open(q + '_timeline.csv', 'w')
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['created_at',
                'user-screen_name',
                'text',
                'coordinates lng',
                'coordinates lat',
                'place',
                'user-location',
                'user-geo_enabled',
                'user-lang',
                'user-time_zone',
                'user-statuses_count',
                'user-followers_count',
                'user-created_at'])
for tweet in results:
    csvwriter.writerow([tweet['created_at'],
                    getVal(tweet['user']['screen_name']),
                    getVal(tweet['text']),
                    getLng(tweet['coordinates']),
                    getLat(tweet['coordinates']),
                    getPlace(tweet['place']),
                    getVal(tweet['user']['location']),
                    getVal(tweet['user']['geo_enabled']),
                    getVal(tweet['user']['lang']),
                    getVal(tweet['user']['time_zone']),
                    getVal(tweet['user']['statuses_count']),
                    getVal(tweet['user']['followers_count']),
                    getVal(tweet['user']['created_at']), 
                    getVal(tweet['retweeted_status']['favorite_count']),
                    ])
print "done"

2 个答案:

答案 0 :(得分:1)

根据https://dev.twitter.com/overview/api/tweets处的API,此属性可能存在也可能不存在。

如果它不存在,您将无法访问该属性。您可以使用in运算符进行安全查找,通过先检查存在来访问它

retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] if 'retweeted_status' in tweet else None

或者假设它在那里但是当它不是

时处理

try: retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] except KeyError: retweeted_favourite_count = 0

然后在写行函数中指定retweeted_favourite_count值。

此外,您的CSV标题行缺少转发的收藏计数

的说明

更新示例: for tweet in results: #Notice this is one long line not two rows. retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] if 'retweeted_status' in tweet else None csvwriter.writerow([tweet['created_at'], getVal(tweet['user']['screen_name']), getVal(tweet['text']), getLng(tweet['coordinates']), getLat(tweet['coordinates']), getPlace(tweet['place']), getVal(tweet['user']['location']), getVal(tweet['user']['geo_enabled']), getVal(tweet['user']['lang']), getVal(tweet['user']['time_zone']), getVal(tweet['user']['statuses_count']), getVal(tweet['user']['followers_count']), getVal(tweet['user']['created_at']), # And insert it here instead getVal(retweeted_favourite_count), ])

你也可以换行:

getVal(tweet['retweeted_status']['favorite_count'])

正如Padriac Cunningham所建议的那样

getVal(tweet.get('retweeted_status', {}).get('favourite_count', None)

答案 1 :(得分:0)

仅供参考,对于今后看到此内容的任何人...我设法使用以下内容获取代码。 getVal(tweet ['favorite_count'])给出了推文的最爱数量。

q = "SkyNews"
results = twitter_user_timeline(twitter_api, q)
csvfile = open(q + '_timeline.csv', 'w')
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['created_at',
                'user-screen_name',
                'text',
                'language',
                'coordinates lng',
                'coordinates lat',
                'place',
                'user-location',
                'user-geo_enabled',
                'user-lang',
                'user-time_zone',
                'user-statuses_count',
                'user-followers_count',
                'user-friend_count',
                'user-created_at', 
                'favorite_count',
                'retweet_count',
                'user-mentions',
                'urls',
                'hashtags',
                'symbols'])
 for tweet in results:
    csvwriter.writerow([tweet['created_at'],
                    getVal(tweet['user']['screen_name']),
                    getVal(tweet['text']),
                    getVal(tweet['lang']),
                    getLng(tweet['coordinates']),
                    getLat(tweet['coordinates']),
                    getPlace(tweet['place']),
                    getVal(tweet['user']['location']),
                    getVal(tweet['user']['geo_enabled']),
                    getVal(tweet['user']['lang']),
                    getVal(tweet['user']['time_zone']),
                    getVal(tweet['user']['statuses_count']),
                    getVal(tweet['user']['followers_count']),
                    getVal(tweet['user']['friends_count']),
                    getVal(tweet['user']['created_at']), 
                    getVal(tweet['favorite_count']),
                    getVal(tweet['retweet_count']),
                    tweet['entities']['user_mentions'],
                    tweet['entities']['urls'],
                    tweet['entities']['hashtags'],
                    tweet['entities']['symbols'],
                    ])

print "done"

其中getVal,getLng和getLat在代码的前面定义为:

def getVal(val):
    clean = ""
    if isinstance(val, bool):
        return val
    if isinstance(val, int):
         return val
    if val:
         clean = val.encode('utf-8') 
    return clean

def getLng(val):
     if isinstance(val, dict):
         return val['coordinates'][0]

def getLat(val):
     if isinstance(val, dict):
        return val['coordinates'][1]

def getPlace(val):
    if isinstance(val, dict):
        return val['full_name'].encode('utf-8')