美丽的汤首先找到<a> whose title attribute equal a certain string

时间:2017-07-25 15:35:23

标签: python html web-scraping beautifulsoup

I'm working with beautiful soup and am trying to grab the first tag on a page that has the attribute equal to a certain string.

For example:

<a href="url" title="export"></a>

What I've been trying to do is grab the href of the first that is found whose title is "export".

  • If I use soup.select("a[title='export']") then I end up finding all tags who satisfy this requirement, not just the first.
  • If I use find("a", {"title":"export"}) with conditions being set such that the title should equal "export", then it grabs the actual items inside the tag, not the href.

  • If I write .get("href") after calling find(), I get None back.

I've been searching the documentation and stack overflow for an answer but have yet found one. Does anyone know a solution to this? Thank you!

2 个答案:

答案 0 :(得分:4)

  

我一直试图做的就是抓住第一个找到标题为&#34; export&#34;。

的href。

你几乎就在那里。您需要做的就是,一旦您获得了标签,您就需要将其编入索引以获得href。这是一个更加防弹的版本:

try:
    url = soup.find('a', {title : 'export' })['href']
    print(url)
except TypeError:
    pass

答案 1 :(得分:0)

按照 html 文件中的相同主题,我只想从 HTML 标签中找到专利号、引文标题。我试过了,但它打印了 HTML 文件中的所有标题,但我特别希望它只在引文下。

url = 'https://patents.google.com/patent/EP1208209A1/en?oq=medicinal+chemistry'
patent = html_file.read() 
#print(patent)
soup = BeautifulSoup(patent, 'html.parser')
x=soup.select('tr[itemprop="backwardReferences"]')
 y=soup.select('td[itemprop="title"]')
print(y)```