我正在尝试使用beautifulsoup在这个单独的HTML div中使用第一个和第二个标签(-130和+110)进行网络抓取(如下所示): example HTML
然而,我无法弄清楚如何刮第二个标签,只能刮第一个。谢谢。from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
day = "09"
month = "10"
year = "2017"
my_url = 'https://www.sportsbookreview.com/betting-odds/mlb-baseball/?date=' + year + month + day
# Opening up the connection and grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parser
page_soup = soup(page_html, "html.parser")
allBovadaOdds = page_soup.find_all("div", {"rel": "999996"})
firstOdds = allBovadaOdds[1].b.string
print(firstOdds)
答案 0 :(得分:1)
您可以尝试使用soup.select()
过滤器代码并使用for i in range():
获取所有第二个代码。请注意,range()
中的步骤应为2
。
# html parser
page_soup = soup(page_html, "html.parser")
allBovadaOdds = page_soup.select('div[rel="999996"] b')
print(allBovadaOdds)
for i in range(1,len(allBovadaOdds),2):
SecondOdds = allBovadaOdds[i].string
print(SecondOdds)
答案 1 :(得分:1)
我认为你想要的东西写得相当简单。
>>> import bs4
>>> import requests
>>> page = requests.get('https://www.sportsbookreview.com/betting-odds/mlb-baseball/?date=20171009').text
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> soup.select('#eventLine-3330496-43 b')
[<b>-130</b>, <b>+110</b>]
>>> for item in soup.select('#eventLine-3330496-43 b'):
... item.text
...
'-130'
'+110'
但是,我注意到两个潜在的问题: