我正在使用以下脚本抓取推文。在运行它时,我得到以下两个错误,但我最关心的是KeyError:
`Traceback(最近一次调用最后一次):文件“Exporter.py”,第76行,in 主要 got3.manager.TweetManager.getTweets(tweetCriteria,receiveBuffer)文件 “/Users/MacbookPro/PycharmProjects/cs446/trend_map-master/getoldtweets/got3/manager/TweetManager.py” 第40行,在getTweets中 json_to_use =((json ['items_html'])。format('div.js-stream-tweet'))KeyError:'“id_str”'
在处理上述异常期间,发生了另一个异常:
Traceback(最近一次调用最后一次):文件“Exporter.py”,第85行,in main(sys.argv [1:])文件“Exporter.py”,第78行,在main中 除了arg:TypeError:不允许捕获不从BaseException继承的类
代码如下:
@staticmethod
def getTweets(tweetCriteria, receiveBuffer=None, bufferLength=100):
refreshCursor = ''
results = []
resultsAux = []
cookieJar = http.cookiejar.CookieJar()
active = True
while active:
json = TweetManager.getJsonReponse(tweetCriteria, refreshCursor, cookieJar)
if len(json['items_html'].strip()) == 0:
break
refreshCursor = json['min_position']
# print(json['items_html']('div.js-stream-tweet'))
json_to_use = ((json['items_html']).format('div.js-stream-tweet'))
tweets = PyQuery(json_to_use)
# print(tweets)
# print(len(tweets))
if len(tweets) == 0:
break
for tweetHTML in tweets:
tweetPQ = PyQuery(tweetHTML)
tweet = models.Tweet()
usernameTweet = tweetPQ("span.username.js-action-profile-name b").text()
txt = re.sub(
r"\s+", " ", tweetPQ("p.js-tweet-text").text().replace('# ', '#').replace('@ ', '@'))
retweets = int(tweetPQ("span.ProfileTweet-action--retweet span.ProfileTweet-actionCount").attr(
"data-tweet-stat-count").replace(",", ""))
favorites = int(tweetPQ("span.ProfileTweet-action--favorite span.ProfileTweet-actionCount").attr(
"data-tweet-stat-count").replace(",", ""))
dateSec = int(tweetPQ("small.time span.js-short-timestamp").attr("data-time"))
id = tweetPQ.attr("data-tweet-id")
permalink = tweetPQ.attr("data-permalink-path")
user_id = int(tweetPQ("a.js-user-profile-link").attr("data-user-id"))
geo = TweetManager.findLocation(usernameTweet)
urls = []
for link in tweetPQ("a"):
try:
urls.append((link.attrib["data-expanded-url"]))
except KeyError:
pass
tweet.id = id
tweet.permalink = 'https://twitter.com' + permalink
tweet.username = usernameTweet
tweet.text = txt
tweet.date = datetime.datetime.fromtimestamp(dateSec)
tweet.formatted_date = datetime.datetime.fromtimestamp(
dateSec).strftime("%a %b %d %X +0000 %Y")
tweet.retweets = retweets
tweet.favorites = favorites
tweet.mentions = " ".join(re.compile('(@\\w*)').findall(tweet.text))
tweet.hashtags = " ".join(re.compile('(#\\w*)').findall(tweet.text))
tweet.geo = geo
tweet.urls = ",".join(urls)
tweet.author_id = user_id
tweet.tweetPQ = tweetPQ
tweet.rawhtml = tweetHTML
tweet.tweets = tweets
tweet.alljson = json
results.append(tweet)
resultsAux.append(tweet)
if receiveBuffer and len(resultsAux) >= bufferLength:
receiveBuffer(resultsAux)
resultsAux = []
if tweetCriteria.maxTweets > 0 and len(results) >= tweetCriteria.maxTweets:
active = False
break
if receiveBuffer and len(resultsAux) > 0:
receiveBuffer(resultsAux)
return results
我不确定问题是什么。我尝试查找它,我没有看到主要是我的格式化如何发出请求的问题。任何帮助将不胜感激!!