我正在试图找出为什么我的网页刮擦中途停止了?
我的代码:
import requests,re
from bs4 import BeautifulSoup
url ="http://odds.aussportsbetting.com/betting?competitionid=15"
r = requests.get(url)
soup = BeautifulSoup(r.content,'html')
#get market
m_data = soup.find_all('div', {'class': 'tabContent'})
for items in m_data:
all_rows = items.findAll('tr')
for data in all_rows:
game = data.findAll('a', {'title': 'Click To Compare Odds'})
market = data.findAll('a', {'title': 'Click To Compare Odds Sorted By Best Bookmaker Odds'})
for g_row in game:
text = ''.join(g_row.findAll(text=True))
g_data = text.strip()
print g_data
for g_row in market:
text = ''.join(g_row.findAll(text=True))
g_data = text.strip()
print g_data
我的输出:
Cleveland @ Cincinnati
102.93
Miami @ Buffalo
102.27
Green Bay @ Carolina
123.42
St Louis @ Minnesota
102.92
Washington @ New England
101.85
Tennessee @ New Orleans
185.93
Jacksonville @ New York Jets
189.21
Oakland @ Pittsburgh
102.51
Atlanta @ San Francisco
101.75
如果你注意到这个链接Click here,你会看到有更多的数据需要被删除但是它会停止。你能帮忙确定原因吗?
答案 0 :(得分:1)
您可能想要更改解析器。
内置的html
很容易失败,请改用html.parser
或lxml
:
soup = BeautifulSoup(r.content,'html.parser')
有关推荐的解析器的更多信息,请参阅BeautifulSoup Docs