我正在尝试从网站上抓取足球比赛结果。我用html获取结果,当我尝试用.text删除它们时,得到奇怪的输出。我使用parent方法获取整个分数的父HTML元素。
scraper脚本:
response = requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
results = html_soup.findAll('strong',text="East Wall Rovers")
chosen_team_results=[]
for result in results:
chosen_team_results.append(result.parent.text)
print(chosen_team_results)
HTML:
<p class="zeta"><strong>
Killester Donnycarney FC</strong>
1
<strong>Cherry Orchard</strong>
2
</p>
<p class="zeta"><strong>
Ballymun United</strong>
2
<strong>Bluebell United</strong>
1
</p>
输出:
'\r\n\t\t\tValeview Shankill\r\n\t\t\t1\r\n\t\t\tEast Wall Rovers\r\n\t\t\t5\r\n\t\t\t\t\t\t', '\r\n\t\t\tMarks Celtic FC\r\n\t\t\t0\r\n\t\t\tEast Wall Rovers\r\n\t\t\t5\r\n\t\t\t\t\t\t', '\r\n\t\t\tBlessington FC\r\n\t\t\t0\r\n\t\t\tEast Wall Rovers\r\n\t\t\t5\r\n\t\t\t\t\t\t', '\r\n\t\t\tParkvale FC\r\n\t\t\t2\r\n\t\t\tEast Wall Rovers\r\n\t\t\t1\r\n\t\t\t\t\t\t', '\r\n\t\t\tBoyne Rovers\r\n\t\t\t1\r\n\t\t\tEast Wall Rovers\r\n\t\t\t1\r\n\t\t\t\t\t\t'
我希望结果以纯文本形式显示,只是团队和得分。
答案 0 :(得分:0)
要摆脱空白,我建议您执行以下操作:
for result in results:
chosen_team_results.append(''.join(str(result.parent.text).split()))
print(chosen_team_results)