IndexError:使用aws时列表索引超出范围

时间:2019-02-12 07:56:11

标签: python amazon-web-services web-scraping

当我在Jupyter和Virtual Machine上运行此代码时,它运行顺利。但是当我开始在AWS上运行时,它总是显示列表索引超出范围。我想知道如何解决这个问题。谢谢!

代码:

from datetime import datetime, timedelta
from time import strptime
import requests
from lxml import html
import re
import time
import os
import sys

from pandas import DataFrame
import numpy as np
import pandas as pd

import sqlalchemy as sa
from sqlalchemy import create_engine
from sqlalchemy.sql import text as sa_text
import pymysql


date_list=[]
for i in range(0,2):
    duration=datetime.today() - timedelta(days=i)
    forma=duration.strftime("%m-%d")
    date_list.append(forma)

print(date_list)



def curl_topic_url_hot():
    url = 'https://www.xxxx.com/topiclist.php?f=397&p=1'
    headers = {'User-Agent': 'aaaaaaaaaaaaaaa'}
    response = requests.get(url, headers=headers)
    tree = html.fromstring(response.text)
    output = tree.xpath("//div[@class='pagination']/a[7]")
    maxPage = int(output[0].text)
    print('There are', maxPage, 'pages.')

    return [maxPage]

topic_url_hot = curl_topic_url_hot()

AWS日志:

['02-12', '02-11']
Traceback (most recent call last):
  File "/home/hadoop/ellen_crawl/test0211_mobile.py", line 167, in <module>
    topic_url_hot = curl_topic_url_hot()
  File "/home/hadoop/ellen_crawl/test0211_mobile.py", line 48, in curl_topic_url_hot
    maxPage = int(output[0].text)
IndexError: list index out of range

当我在Jupyter上运行此代码时,它显示:

['02-12', '02-11']
There are 818 pages.

3 个答案:

答案 0 :(得分:3)

您可以使用

if len(output) > 1:
    maxPage = int(output[0].text)

try:
    maxPage = int(output[0].text)
except IndexError:
    # do sth. with the error message

无论哪种情况,您的原始代码都不会产生您认为会产生的结果。

答案 1 :(得分:3)

您可以通过首先测试并且仅将结果编入索引中,或者通过try / except-catting错误来摆脱错误:

if len(output)>0: 
    maxPage = int(output[0].text)

try:
    maxPage = int(output[0].text)
except IndexError as e:
    pass # log it or do smth with it

您的实际问题在其他地方:

您的卷曲不会产生您认为的效果-也许AWS不支持您想要的功能,因此该请求被阻止并且什么都不返回?也许您的网址中有错字?

一些想法:

  • 检查tree
  • 的内容
  • 检查您的AWS日志。
  • 检查response的错误代码
  • 手动尝试该网址(您已经这样做了,这对于以后找到它的其他人来说更多)

答案 2 :(得分:-1)

您的AWS访问此网站,它返回错误html,请检查它。 https://www.xxxx.com/topiclist.php?f=397&p=1