有趣的问题。我正在用硒刮擦一个投注站点,然后用bs4处理。问题是,网站加载赔率信息的方式与加载团队名称的方式不同。例如:
London v Tokyo 2/1 4/1
Amsterdam v Helsinki 5/1 3/1
New York v California 7/1 10/1
当我拉动它并对其进行迭代时,它会像这样出现:
Names = [London, Tokyo, Amsterdam, Helsinki]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
赔率是从上到下,从左到右以不同长度的块加载的。这意味着当我尝试将名称和赔率拼接在一起时,它们将不匹配。
我的问题是,我该如何解决?我想最终获得信息,以便团队名称后面加上赔率:
Games = [London, 2/1, Tokyo, 4/1, Amsterdam, 5/1, Helsinki, 3/1, New York, 7/1, California, 10/1]
**更新** 该站点是:https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/ 如果您获得登录页面,则只需单击即可。然后是左侧面板上的“电子竞技”,然后是中间页中的“所有匹配项”。
代码:
from selenium import webdriver
from bs4 import BeautifulSoup
url = "https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/"
driver = webdriver.Chrome()
driver.get(url)
# Then i'm navigating to the "All Matches" page
soup = BeautifulSoup(driver.page_source, 'html.parser')
teams = driver.find_elements_by_class_name("sl-CouponParticipantWithBookCloses_Name")
odds_raw = driver.find_elements_by_class_name("gl-ParticipantOddsOnly_Odds")
odds = []
teams_text = []
new_teams = []
new_odds = []
for name in teams:
teams_text.append(name.text)
团队像块一样进来,例如:“伦敦v东京”。 所以要分离团队名称,我要进行迭代并将其拆分
for name in teams_text:
first, second = name.split(" v ")
new_teams.append(first)
new_teams.append(second)
然后我将小数接收的赔率转换为十进制:
for odd in odds_raw:
odds.append(odd.text)
for odd in odds:
first, second = odd.split("/")
new_odd = (int(first) / int(second)) + 1
new_odds.append(round(new_odd, 2))
所以现在我有了所有球队名称的列表以及十进制奇数值的列表。这是我的问题所在。 bet365产生比赛赔率的方式是,每个比赛分区的垂直区块的长度都不同。
因此,如果赔率看起来像这样:
Division 1
London v Tokyo 1 2
Amsterdam v Helsinki 3 4
Division 2
New York v California 5 6
Division 3
Sydney v Brisbane 7 8
Bali v Singapore 9 10
Berlin v Paris 11 12
然后,当我拉动它们时,赔率就会像这样:
[1, 3, 2, 4, 5, 6, 7, 9, 11, 8, 10, 12]
在各部分的长度不同的地方,我很难弄清楚该如何处理。
答案 0 :(得分:0)
您可以使用正则表达式捕获元素。
import re
s = '''London v Tokyo 2/1 4/1 Amsterdam v Helsinki 5/1 3/1 New York v California 7/1 10/1'''
re.findall(r'(\w+)\s+v\s+(\w+)\s+(\d+/\d+)\s+(\d+/\d+)', s)
[('London', 'Tokyo', '2/1', '4/1'),
('Amsterdam', 'Helsinki', '5/1', '3/1'),
('York', 'California', '7/1', '10/1')]
答案 1 :(得分:0)
您可以使用for
这样的循环来实现所需的输出:
Names = ["London", "Tokyo", "Amsterdam", "Helsinki","New York","California"]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
start_nmb = 1
for nmb, odd in enumerate(Odds):
Names.insert(start_nmb, odd)
start_nmb += 2
输出:
['London', 2.0, 'Tokyo', 5.0, 'Amsterdam', 4.0, 'Helsinki', 3.0, 'New York', 7.0, 'California', 10.0]
希望这会有所帮助!
答案 2 :(得分:0)
这是一个漫长的尝试方法。赔率的奇数行(由循环确定)进入第1队(第1队对第2队的左侧。偶数行进入第2队)。列表的列表被展平。然后如答案{{3}所示,将列表合并}由@ user942640分配给候补成员。
注意:这依赖于等长列表来返回准确的结果。
import itertools
from bs4 import BeautifulSoup as bs
#your existing code to get to page and wait for presence of all elements
soup = bs(driver.page_source, 'lxml')
teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]
i = 0
team1 = []
team2 = []
for item in soup.select('.sl-MarketCouponValuesExplicit2'):
if i % 2 == 0:
team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
else:
team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
i+=1
team1 = [item for sublist in team1 for item in sublist]
team2 = [item for sublist in team2 for item in sublist]
teams = [item for sublist in teams for item in sublist]
team_odds = [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
print(final)
所以,类似(注意赔率一直在更新):
from selenium import webdriver
import itertools
from bs4 import BeautifulSoup as bs
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.bet365.com/#/HO/')
driver.get('https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/')
WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sl-MarketCouponValuesExplicit2")))
soup = bs(driver.page_source, 'lxml')
teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]
i = 0
team1 = []
team2 = []
for item in soup.select('.sl-MarketCouponValuesExplicit2'):
if i % 2 == 0:
team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
else:
team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
i+=1
team1 = [item for sublist in team1 for item in sublist]
team2 = [item for sublist in team2 for item in sublist]
teams = [item for sublist in teams for item in sublist]
team_odds = [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
print(final)