I'm working with beautiful soup and am trying to grab the first tag on a page that has the attribute equal to a certain string.
For example:
<a href="url" title="export"></a>
What I've been trying to do is grab the href of the first that is found whose title is "export".
soup.select("a[title='export']")
then I end up finding all tags who satisfy this requirement, not just the first.If I use find("a", {"title":"export"})
with conditions being set such that the title should equal "export", then it grabs the actual items inside the tag, not the href.
If I write .get("href")
after calling find()
, I get None back.
I've been searching the documentation and stack overflow for an answer but have yet found one. Does anyone know a solution to this? Thank you!
答案 0 :(得分:4)
我一直试图做的就是抓住第一个找到标题为&#34; export&#34;。
的href。
你几乎就在那里。您需要做的就是,一旦您获得了标签,您就需要将其编入索引以获得href。这是一个更加防弹的版本:
try:
url = soup.find('a', {title : 'export' })['href']
print(url)
except TypeError:
pass
答案 1 :(得分:0)
按照 html 文件中的相同主题,我只想从 HTML 标签中找到专利号、引文标题。我试过了,但它打印了 HTML 文件中的所有标题,但我特别希望它只在引文下。
url = 'https://patents.google.com/patent/EP1208209A1/en?oq=medicinal+chemistry'
patent = html_file.read()
#print(patent)
soup = BeautifulSoup(patent, 'html.parser')
x=soup.select('tr[itemprop="backwardReferences"]')
y=soup.select('td[itemprop="title"]')
print(y)```