在同一行BS4中获取单独的文本

时间:2018-10-07 15:27:25

标签: python beautifulsoup

使用漂亮的汤,如何从以下列表中获取不同列表中的团队名称和分数:

[<div class="name">Man. City<span class="record">19 pts</span></div>,
 <div class="name">Liverpool<span class="record">19 pts</span></div>,
 <div class="name">Chelsea<span class="record">17 pts</span></div>,
 <div class="name">Tottenham<span class="record">18 pts</span></div>,
 <div class="name">Arsenal<span class="record">18 pts</span></div>,
 <div class="name">Man. United<span class="record">13 pts</span></div>,
 <div class="name">Bournemouth<span class="record">16 pts</span></div>,
 <div class="name">Leicester City<span class="record">12 pts</span></div>,
 <div class="name">Wolverhampton<span class="record">15 pts</span></div>,
 <div class="name">Watford<span class="record">13 pts</span></div>,
 <div class="name">Everton<span class="record">12 pts</span></div>,
 <div class="name">West Ham<span class="record">7 pts</span></div>,
 <div class="name">Crystal Palace<span class="record">7 pts</span></div>,
 <div class="name">Brighton<span class="record">8 pts</span></div>,
 <div class="name">Southampton<span class="record">5 pts</span></div>,
 <div class="name">Newcastle<span class="record">2 pts</span></div>,
 <div class="name">Burnley<span class="record">8 pts</span></div>,
 <div class="name">Fulham<span class="record">5 pts</span></div>,
 <div class="name">Huddersfield<span class="record">3 pts</span></div>,
 <div class="name">Cardiff City<span class="record">2 pts</span></div>]

2 个答案:

答案 0 :(得分:1)

我想我只是回答您的另一个问题,顺便说一句,您可以做这样的事情。...

import requests
from bs4 import BeautifulSoup
r = requests.get('https://projects.fivethirtyeight.com/soccer-predictions/premier-league/')
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find_all('table', attrs={'class':'forecast-table'})
for i in table:
    tr = i.find_all('tr')
    for x in tr:
        try:
            name = x.find('div', attrs={'class':'name'})
            pts = x.find('span', attrs={'class':'record'})
            print('Name:', name.next, 'Pts:', pts.next)
        except:
            pass

输出:

Name: Man. City Pts: 19 pts
Name: Liverpool Pts: 19 pts
Name: Chelsea Pts: 20 pts
Name: Tottenham Pts: 18 pts
Name: Arsenal Pts: 18 pts
Name: Man. United Pts: 13 pts
Name: Bournemouth Pts: 16 pts
Name: Leicester City Pts: 12 pts
Name: Wolverhampton Pts: 15 pts
Name: Watford Pts: 13 pts
Name: Everton Pts: 12 pts
Name: West Ham Pts: 7 pts
Name: Crystal Palace Pts: 7 pts
Name: Brighton Pts: 8 pts
Name: Southampton Pts: 5 pts
Name: Newcastle Pts: 2 pts
Name: Fulham Pts: 5 pts
Name: Burnley Pts: 8 pts
Name: Huddersfield Pts: 3 pts
Name: Cardiff City Pts: 2 pts

答案 1 :(得分:0)

您可以这样分隔分数和球队名称:

teams = []
s = bs(html)
for i in s.findAll("div"):
    teams.append(''.join(i.text))

point = []
name = []

for team in teams:
    count = 0
    for char in team:
        if (ord(char) >= 48) & (ord(char) <= 59):
            point.append(team[count:])
            name.append(team[:count])
            break
        count = count + 1

print(point)
print(name)

对于输出,它看起来像这样:

[“ 19分”,“ 19分”,“ 17分”,“ 18分”,“ 18分”,“ 13分”,“ 16分”,“ 12分”,“ 15分”,“ 13分”,“ 12分”,“ 7分”,“ 7分”,“ 8分”,“ 5分”,“ 2分”,“ 8分”,“ 5分”,“ 3分”,“ 2分”]

['伙计。城市”,“利物浦”,“切尔西”,“热刺”,“阿森纳”,“曼。曼联”,“伯恩茅斯”,“莱斯特城”,“沃尔夫汉普顿”,“沃特福德”,“埃弗顿”,“西汉姆”,“水晶宫”,“布莱顿”,“南安普敦”,“纽卡斯尔”,“伯恩利”, “富勒姆”,“哈德斯菲尔德”,“加的夫城”]