from bs4 import BeautifulSoup
import requests
url = 'http://new.cpc.com.tw/division/mb/oil-more4.aspx'
html = requests.get(url).text
sp = BeautifulSoup(html, 'html.parser')
data = sp.find_all('span', {'id':'Showtd'})
rows = data[0].find_all('tr')
prices = list()
for row in rows:
cols = row.find_all('td')
if len(cols[1].text) > 0:
item = [cols[0].text, cols[1].text,cols[2].text, cols[3].text]
prices.append(item)
for p in prices:
print(p)
我收到如下错误:
>IndexError Traceback (most recent call >last)
><ipython-input-4-0e950be61842> in <module>()
> 10 sp = BeautifulSoup(html, 'html.parser')
> 11 data = sp.find_all('span', {'id':'Showtd'})
>---> 12 rows = data[0].find_all('tr')
> 13
> 14 prices = list()
>IndexError: list index out of range
答案 0 :(得分:1)
更改此
url = 'http://new.cpc.com.tw/division/mb/oil-more4.aspx'
到
url = 'https://new.cpc.com.tw/division/mb/oil-more4.aspx'
否则,实际响应会显示有关SSL重定向的信息(根本不返回任何表,也不是您预期的页面)。适合我