什么是使用python从omdb获取电影信息的最快方法?

时间:2017-07-23 00:25:50

标签: python json api csv movie

我在文件中有大约20万个imdb_id,并希望使用JSON API从这些imdb_id获取omdb个信息。

我编写了这段代码并且工作正常,但速度很慢(每个ID为3秒,需要166小时):

import urllib.request
import csv
import datetime
from collections import defaultdict


i = 0
columns = defaultdict(list)
with open('a.csv', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        for (k, v) in row.items():
            columns[k].append(v)
with open('a.csv', 'r', encoding='utf-8') as csvinput:
    with open('b.csv', 'w', encoding='utf-8', newline='') as csvoutput:
        writer = csv.writer(csvoutput)
        for row in csv.reader(csvinput):
            if row[0] == "item_id":
                writer.writerow(row + ["movie_info"])
            else:
                url = urllib.request.urlopen(
                    "http://www.omdbapi.com/?i=tt" + str(columns['item_id'][i]) + "&apikey=??????").read()
                url = url.decode('utf-8')
                writer.writerow((row + [url]))
                i = i + 1

从omdb使用python ???

获取电影信息的最快方法

**编辑:我写了这段代码,在得到1022 url resopnse后我发现了这个错误:

import grequests

urls = open("a.csv").readlines()
api_key = '??????'


def exception_handler(request, exception):
    print("Request failed")


# read file and put each lines to an LIST
for i in range(len(urls)):
    urls[i] = "http://www.omdbapi.com/?i=tt" + str(urls[i]).rstrip('\n') + "&apikey=" + api_key
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests, exception_handler=exception_handler)
with open('b.json', 'wb') as outfile:
    for response in responses:
        outfile.write(response.content)

错误是:

Traceback (most recent call last):
  File "C:/python_apps/omdb_async.py", line 18, in <module>
    outfile.write(response.content)
AttributeError: 'NoneType' object has no attribute 'content'

我该如何解决这个错误?

1 个答案:

答案 0 :(得分:2)

此代码是IO绑定的,并且可以从使用Python的异步/等待功能中受益匪浅。您可以遍历您的URL集合,为每个URL创建异步执行请求,就像this SO question中的示例一样。

一旦异步地发出这些请求,您可能需要将请求率限制在OMDB API限制范围内。