使用Beautiful Soup从HTML标签提取数据

时间:2020-06-15 03:59:54

标签: python web-scraping beautifulsoup

这是我的代码,在最后四行中,我试图从h3标签中获得播放器的名称。

当我使用player_name = player1.h3时,它会给出正确的h3标签,即<h3>Marc-André ter Stegen</h3> 但是我无法使用Marc-André ter Stegen.txt来获取内部文本get_text,而是得到一个空字符串。

相同的方法在另一个程序中也可以正常工作。

from requests import get
from bs4 import BeautifulSoup



header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
}

url = 'https://www.ea.com/games/fifa/fifa-20/ratings/fifa-20-player-ratings-top-100'

response = get(url, headers=header)
html_soup = BeautifulSoup(response.text, 'html.parser')

player_container = html_soup.find_all('ea-container', attrs={'slot': 'container'})
player1 = player_container[0]
player_name = player1.h3.get_text()  # or player1.h3.text 
print((player_name))

#reslut: Empty String

1 个答案:

答案 0 :(得分:0)

只需添加for a in player1.h3: 代码:

from requests import get
from bs4 import BeautifulSoup



header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
}

url = 'https://www.ea.com/games/fifa/fifa-20/ratings/fifa-20-player-ratings-top-100'

response = get(url, headers=header)
html_soup = BeautifulSoup(response.text, 'html.parser')

player_container = html_soup.find_all('ea-container', attrs={'slot': 'container'})
player1 = player_container[0]
for a in player1.h3:
    print(a)

输出:

Marc-André ter Stegen