我正在使用BS4从以下页面返回信息:https://www.sportsbookreview.com/betting-odds/mlb-baseball/money-line/?date=20171029
我遇到的问题是将按局得分框恢复为可用格式。理想情况下,我希望通过半局保存得分列表,看起来像['3','0','0','0'...]。到目前为止,我只能返回['30','00'...]。
两个分数都属于同一类,我认为这是导致问题的原因
<div class="_2563p">
<div class="_1Y3rN _308Yc">
<div>3</div>
<div>0</div></div>
<div class="_1Y3rN _308Yc">
<div>0</div>
<div>0</div>
</div>
我目前可以使用以下方法返回小组分数['30','00'...]:
import bs4, pandas as pd, re
from datetime import datetime
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(r'C:\Users\grant\PythonScripts\chromedriver.exe')
url = ('https://www.sportsbookreview.com/betting-odds/mlb-baseball/?date=' + betDate) # Full MLs
driver.get(url)
fullML = driver.page_source
driver.quit()
fullMLsoup = bs4.BeautifulSoup(fullML, 'html.parser')
x = [el.text.strip() for el in fullMLsoup.find_all(re.compile(r'div'), {'class':"_1Y3rN _308Yc"})]
print(x)
有人可以帮我返回我想要的['3','0','0','0']格式吗?
答案 0 :(得分:0)
您可以在bs4 4.7.1中使用第n个孩子和第一个孩子。最多到最后两列。
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.sportsbookreview.com/betting-odds/mlb-baseball/money-line/?date=20171029')
soup = bs(r.content, 'lxml')
top = [item.text for item in soup.select('._308Yc div:first-child')]
bottom = [item.text for item in soup.select('._308Yc div:nth-child(2)')]
print(top, bottom)
您可以通过正则表达式将所有内容都保存为json,然后解析json。探索json here
import requests
import re
import json
r = requests.get('https://www.sportsbookreview.com/betting-odds/mlb-baseball/money-line/?date=20171029')
p = re.compile(r'window.__INITIAL_STATE__=(.*?);\n', re.DOTALL)
data = json.loads(p.findall(r.text)[0])