在抓取时删除python中的html标签

时间:2017-03-16 14:59:40

标签: python html

所以我试图从ESPN那里拿下一场NBA比赛的得分。我试图先获得名字,但我很难摆脱html标签。

我尝试过使用

get_text(), .text(), .string_strip()

但他们一直在给我错误。

这是我现在正在使用的代码。

from bs4 import BeautifulSoup
import requests

url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")

name = []
for row in soup.find_all('tr')[1:]:
        player_name = row.find('td', attrs={'class': 'name'})
        name.append(player_name)
print(name)

3 个答案:

答案 0 :(得分:4)

使用player_name.text应该有效,但问题是有时row.find('td', attrs={'class': 'name'}为空。试试这样:

if player_name:
     name.append(player_name.text)

答案 1 :(得分:2)

我这样解决了这个问题:

from bs4 import BeautifulSoup
import requests

url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")

name = []
for row in soup.find_all('tr')[1:]:
    try:
        player_name = row.select('td.name span')[0].text
        name.append(player_name)
    except:
        pass
print(name)

答案 2 :(得分:1)

我的代码供您参考

import requests

from pyquery import PyQuery as pyq

url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
doc = pyq(r.content)
print([h.text() for h in doc('.abbr').items()])