我制作了一个网络剪贴簿,它可以正常工作,直到它将转储到e_data中的数据进行排序。我是一个完整的蟒蛇新手,任何帮助将不胜感激。
错误:
Traceback (most recent call last):
File "C:\wamp\www\_clients\dstest\web_scrape2.py", line 78, in <module>
customer = row.find_all('td')[2].getText().split()
IndexError: list index out of range
错误的代码:
if re.findall('\\bnew\\b', str(e_data)) != []:
for row in e_data.find_all('tr'):
if re.findall('</table>', str(row)) == [] and re.findall('\\bnew\\b', str(row)) != []:
job_no = row.find('a').string
customer = row.find_all('td')[2].getText().split()
move_date = row.find_all('td')[3].getText()
result = {'job_no': job_no, 'customer': customer, 'move_date': move_date}
print json.dumps(result)
else:
print "Data unavailable"
e_data的内容:
<center><b><h3>
Total of: 1 Transactions
<right>
<a href="javascript:window.close()">Exit</a>
<style type="text/css"> .xf{color: blue; text-decoration: underline;} .xn{color: red; text-decoration: underline; cursor: Hand}></style>
</right></h3></b></center>
<table align="center" border="0" cellspacing="0" width="90%"><tr><td>
<center>
<table bgcolor="#EEEEEE" border="1" cellpadding="3" cellspacing="0" style="font-size: 8pt" width="100">
<tr bgcolor="DarkBlue"><th><font color="White" face="Verdana,Helvetica">job_no</font></th><th><font color="White" face="Verdana,Helvetica">category</font></th><th><font color="White" face="Verdana,Helvetica">customer</font></th><th><font color="White" face="Verdana,Helvetica">move_date</font></th><th><font color="White" face="Verdana,Helvetica">deliver</font></th><th><font color="White" face="Verdana,Helvetica">dlv_imm</font></th><th><font color="White" face="Verdana,Helvetica">origin</font></th><th><font color="White" face="Verdana,Helvetica">destination</font></th><th><font color="White" face="Verdana,Helvetica">miles</font></th><th><font color="White" face="Verdana,Helvetica">cf_lbs</font></th><th><font color="White" face="Verdana,Helvetica">estimate</font></th><th><font color="White" face="Verdana,Helvetica">open_date</font></th><th><font color="White" face="Verdana,Helvetica">vip</font></th></tr><tr style="background:#CCCCFF" valign="TOP"><td><a href="/wc.dll?mprep~printselect~LTPAX57752~UZ2W225186" target="_blank">J4074407</a><br> <b><br><font color="#FF0000">new</font></br></b></br></td></tr></table></center></td></tr></table><td>Long_Dist.<br>FollowUp<br><b><font color="#008000" size="1">REFERENCE</font></b></br></br></td><td><b>Newlead2</b><br>User:SAM <br>newlead2@gmail.com <br>4838484838</br></br></br></td><td>01/18/2016</td><td> / / <br> / /</br></td><td>...</td><td><b>FL FORT LAUDERDALE </b></td><td><b>CA OAKLAND </b></td><td align="RIGHT">3068</td><td>200 cf<br>2000 lbs</br></td><td align="RIGHT">1410.00</td><td align="CENTER">02/18/2015 02:27:05 pm</td><td>...</td>
<tr bgcolor="DarkBlue" style="font-face:bold"><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td align="RIGHT"><font color="White"><b> 3068</b></font></td><td></td><td align="RIGHT"><font color="White"><b> 1410.00</b></font></td><td></td><td></td></tr>
答案 0 :(得分:0)
您正尝试从
访问数组中不存在的元素customer = row.find_all('td')[2].getText().split()
打印数组的长度,你就会知道
len(row.findall('td'))
使用
遍历所有'td'元素tdEllements = row.find_all('td')
for tdElement in tdElements:
#your code here
答案 1 :(得分:0)
您的第一个tr
行没有任何td
个,因此当您尝试超出row.find_all('td')[n]
时,索引会超出范围
尝试使用try..except
:
try:
customer = row.find_all('td')[2].getText().split()
move_date = row.find_all('td')[3].getText()
except IndexError:
continue