列表索引错误(即使它存在)?

时间:2012-08-17 13:05:33

标签: xml google-app-engine xml-parsing

我正在运行for循环以从某些XML中获取内容并且它工作正常,直到我到达第29次迭代。那时它给了我这个错误:

File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 572, in dispatch
  return self.handle_exception(e, self.app.debug)   
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 570, in dispatch
  return method(*args, **kwargs)   
File "J:\Art & Graphic Design\Graphic Design\Websites\lawvoter-dev\cron_congressman.py", line 64, in get
  birthday      = re.findall("<birthday>(.*)</birthday>",element)[0] 
IndexError: list index out of range

代码是:

for element in members:
            title         = re.findall("<title>(.*)</title>",element)[0]
            role          = re.findall("<role_type_label>(.*)</role_type_label>",element)[0]
            name_sortable = re.findall("<name_sortable>(.*)</name_sortable>",element)[0]
            firstname     = re.findall("<firstname>(.*)</firstname>",element)[0]
            lastname      = re.findall("<lastname>(.*)</lastname>",element)[0]
            gender        = re.findall("<gender_label>(.*)</gender_label>",element)[0]
            birthday      = re.findall("<birthday>(.*)</birthday>",element)[0]
            party         = re.findall("<party>(.*)</party>",element)[0]
            state         = re.findall("<state>(.*)</state>",element)[0]
            description   = re.findall("<description>(.*)</description>",element)[0]
            start_date    = re.findall("<startdate>(.*)</startdate>",element)[0]
            end_date      = re.findall("<enddate>(.*)</enddate>",element)[0]
            website       = re.findall("<website>(.*)</website>",element)[0]
            bioguideid    = re.findall("<bioguideid>(.*)</bioguideid>",element)[0]
            osid          = re.findall("<osid>(.*)</osid>",element)[0]
            pvsid         = re.findall("<pvsid>(.*)</pvsid>",element)[0]
            twitterid     = re.findall("<twitterid>(.*)</twitterid>",element)[0]
            youtubeid     = re.findall("<youtubeid>(.*)</youtubeid>",element)[0]

            member = Congressman(title=title, role=role, name_sortable=name_sortable, firstname=firstname, lastname=lastname, gender=gender, birthday=birthday, party=party, state=state,
                                 description=description, start_date=start_date, end_date=end_date, website=website, bioguideid=bioguideid, osid=osid, pvsid=pvsid, twitterid=twitterid, youtubeid=youtubeid)
            member.put()

我真的不明白为什么会出现这个错误?它总是适用于前29次迭代?为了以防万一,数据模型中的每个元素也设置为“default = None”。但是,当我查看XML本身,并转到错误发生的确切行时,该值实际上就在那里。任何人都知道为什么即使值存在也会出错?

1 个答案:

答案 0 :(得分:1)

看起来像

birthday      = re.findall("<birthday>(.*)</birthday>",element)[0]

返回一个空列表,你试图提取不在列表中的第一个元素并抛出

IndexError: list index out of range

喜欢这里:

>>> l = []
>>> l[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> 

编辑:

import re, logging

def findelement(item, element):
    i = re.findall(item, element)
    if not i:
        logging.info('no item found for %s with element %s' %(item, element))
        return ''
    return i[0]


for element in members:
    title = findelement("<title>(.*)</title>", element)
    ...