Beautifullsoup:如何获取<a> tag thats within a </a> <h1> <a>?

时间:2018-11-16 16:18:24

标签: python html beautifulsoup

I have been trying to extract the name from a twitter profile, the only problem I'm having is that beautifulsoup grabs the entire element. I have tried the {"class":} to specify the element but whenever I do this it results in getting

AttributeError: 'NoneType' object has no attribute 'text' error.

My code:

url = "https://twitter.com/barackobama"
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').text
print(name)

1 个答案:

答案 0 :(得分:4)

如果要从标题的子链接中获取文本而不是完整的标题文本,请尝试

url = "https://twitter.com/barackobama"
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').a.text
print(name)
# 'Barack Obama'