如何在<a href="" tag="" with="" python="" beautifulsoup

时间:2016-07-28 19:32:33

标签: python beautifulsoup

="" Im stuck at a problem with python 2.7.12 using BeautifulSoup to scrape some webpage data, I really can't figure how to scrape a specific 'title=' tag within a <a href link </a>

Until now I get output with this code:

    import urllib2
    from bs4 import BeautifulSoup

    hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"}
    url = 'REMOVED'

    req=urllib2.Request(url,headers=hdr)
    urllib2.urlopen(url).read()
    html = urllib2.urlopen(req).read()
    soup=BeautifulSoup(html,"html5lib")

    players = soup.find_all("td", {"data-title": "Navn"})

    player_data = ""
    saveFile = open('player_data.txt','w')

for item in players:

    player_data = item.contents[0].encode("utf-8")
    print player_data
    saveFile.write (player_data)

saveFile.close()    

I get lines of data in this format:

<a href="/da/player/123/lionel-messi/" title="Lionel Messi">Lionel Messi</a>

Could anyone please help me to get the specific name from 'title=' I just can't seem to get it working...

Thanks in advance :)

1 个答案:

答案 0 :(得分:3)

为了从href代码中获取标题:

players = soup.find('a')['title']

<强>输出:

Lionel Messi

什么是soup.find('a')['title']

  • .find('a')表示找到a href标记
  • ['title]表示从标记
  • 获取title属性