我在文件中有大约20万个imdb_id
,并希望使用JSON
API从这些imdb_id
获取omdb
个信息。
我编写了这段代码并且工作正常,但速度很慢(每个ID为3秒,需要166小时):
import urllib.request
import csv
import datetime
from collections import defaultdict
i = 0
columns = defaultdict(list)
with open('a.csv', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
with open('a.csv', 'r', encoding='utf-8') as csvinput:
with open('b.csv', 'w', encoding='utf-8', newline='') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[0] == "item_id":
writer.writerow(row + ["movie_info"])
else:
url = urllib.request.urlopen(
"http://www.omdbapi.com/?i=tt" + str(columns['item_id'][i]) + "&apikey=??????").read()
url = url.decode('utf-8')
writer.writerow((row + [url]))
i = i + 1
从omdb使用python ???
获取电影信息的最快方法**编辑:我写了这段代码,在得到1022 url resopnse后我发现了这个错误:
import grequests
urls = open("a.csv").readlines()
api_key = '??????'
def exception_handler(request, exception):
print("Request failed")
# read file and put each lines to an LIST
for i in range(len(urls)):
urls[i] = "http://www.omdbapi.com/?i=tt" + str(urls[i]).rstrip('\n') + "&apikey=" + api_key
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests, exception_handler=exception_handler)
with open('b.json', 'wb') as outfile:
for response in responses:
outfile.write(response.content)
错误是:
Traceback (most recent call last):
File "C:/python_apps/omdb_async.py", line 18, in <module>
outfile.write(response.content)
AttributeError: 'NoneType' object has no attribute 'content'
我该如何解决这个错误?
答案 0 :(得分:2)
此代码是IO绑定的,并且可以从使用Python的异步/等待功能中受益匪浅。您可以遍历您的URL集合,为每个URL创建异步执行请求,就像this SO question中的示例一样。
一旦异步地发出这些请求,您可能需要将请求率限制在OMDB API限制范围内。