所以我试图从ESPN那里拿下一场NBA比赛的得分。我试图先获得名字,但我很难摆脱html标签。
我尝试过使用
get_text(), .text(), .string_strip()
但他们一直在给我错误。
这是我现在正在使用的代码。
from bs4 import BeautifulSoup
import requests
url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")
name = []
for row in soup.find_all('tr')[1:]:
player_name = row.find('td', attrs={'class': 'name'})
name.append(player_name)
print(name)
答案 0 :(得分:4)
使用player_name.text
应该有效,但问题是有时row.find('td', attrs={'class': 'name'}
为空。试试这样:
if player_name:
name.append(player_name.text)
答案 1 :(得分:2)
我这样解决了这个问题:
from bs4 import BeautifulSoup
import requests
url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")
name = []
for row in soup.find_all('tr')[1:]:
try:
player_name = row.select('td.name span')[0].text
name.append(player_name)
except:
pass
print(name)
答案 2 :(得分:1)
我的代码供您参考
import requests
from pyquery import PyQuery as pyq
url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
doc = pyq(r.content)
print([h.text() for h in doc('.abbr').items()])