无法从IPython Notebook将数据重新加载为csv文件

时间:2014-12-14 06:23:44

标签: api ipython ipython-notebook

我有以下IPython Notebook,我正在尝试从烂番茄网站访问电影的数据库。

但烂番茄限制为每天10,000次API请求

因此,每次重新启动笔记本时,我都不想重新运行此功能,我正在尝试将此数据保存并重新加载为CSV文件。当我将数据转换为csv文件时,我在IPython笔记本中获得了这个处理符号[*]。一段时间后,我收到以下错误

ConnectionError: HTTPConnectionPool(host='api.rottentomatoes.com', port=80): Max retries exceeded with url: /api/public/v1.0/movie_alias.json?apikey=5xr26r2qtgf9h3kcq5kt6y4v&type=imdb&id=0113845 (Caused by <class 'socket.gaierror'>: [Errno 11002] getaddrinfo failed)

这个问题是由于互联网连接速度缓慢造成的吗?我应该对我的代码进行一些更改吗?请帮助我。

该文件的代码如下所示:

%matplotlib inline

import json

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

api_key = '5xr26r2qtgf9h3kcq5kt6y4v'
movie_id = '770672122'  # toy story 3
url = 'http://api.rottentomatoes.com/api/public/v1.0/movies/%s/reviews.json' % movie_id

#these are "get parameters"
options = {'review_type': 'top_critic', 'page_limit': 20, 'page': 1, 'apikey': api_key}
data = requests.get(url, params=options).text
data = json.loads(data)  # load a json string into a collection of lists and dicts
print json.dumps(data['reviews'][0], indent=2)   # dump an object into a json string

from io import StringIO  
movie_txt = requests.get('https://raw.github.com/cs109/cs109_data/master/movies.dat').text
movie_file = StringIO(movie_txt) # treat a string like a file
movies = pd.read_csv(movie_file,delimiter='\t')
movies
#print the first row
movies[['id', 'title', 'imdbID', 'year']]

def base_url():
    return 'http://api.rottentomatoes.com/api/public/v1.0/'

def rt_id_by_imdb(imdb):
    """
    Queries the RT movie_alias API. Returns the RT id associated with an IMDB ID,
    or raises a KeyError if no match was found
    """
    url = base_url() + 'movie_alias.json'

    imdb = "%7.7i" % imdb
    params = dict(id=imdb, type='imdb', apikey=api_key)

    r = requests.get(url, params=params).text
    r = json.loads(r)

    return r['id']


def _imdb_review(imdb):
    """
    Query the RT reviews API, to return the first page of reviews 
    for a movie specified by its IMDB ID

    Returns a list of dicts
    """    
    rtid = rt_id_by_imdb(imdb)
    url = base_url() + 'movies/{0}/reviews.json'.format(rtid)

    params = dict(review_type='top_critic',
                  page_limit=20,
                  page=1,
                  country='us',
                  apikey=api_key)
    data = json.loads(requests.get(url, params=params).text)
    data = data['reviews']
    data = [dict(fresh=r['freshness'], 
                 quote=r['quote'], 
                 critic=r['critic'], 
                 publication=r['publication'], 
                 review_date=r['date'],
                 imdb=imdb, rtid=rtid
                 ) for r in data]
    return data

def fetch_reviews(movies, row):
    m = movies.irow(row)
    try:
        result = pd.DataFrame(_imdb_review(m['imdbID']))
        result['title'] = m['title']
    except KeyError:
        return None
    return result

def build_table(movies, rows):
    dfs = [fetch_reviews(movies, r) for r in range(rows)]
    dfs = [d for d in dfs if d is not None]
    return pd.concat(dfs, ignore_index=True)


critics = build_table(movies, 3000)
critics.to_csv('critics.csv', index=False)
critics = pd.read_csv('critics.csv')

0 个答案:

没有答案