Question

="" I am all new to python and beautifulsoup. I want to get the link form the href. Unfortunately, the anchor also includes other and irrelevant data.

Help is much apreciated

<a href="/link-i-want/to-get.html">
<li class="cat-list-row1 clearfix">
<img align="left" alt="Do not need!" src="https://do.not/need/.jpg" style="margin-right: 20px;" width="40%"/>
<h3>
<p class="subline">Do not need</p>	Do not need!				</h3>
<span class="tag-body">
<p>Do not need</p>...				</span>
<div style="clear:both;"></div>
</li>
</a>

Answer 1

可以使用[]括号提取属性值。

例如，如果要提取alt值img标记，请使用： image_example = soup.find('img')然后print(image_example['alt'])

更新的代码：

from bs4 import BeautifulSoup

data = '''
    <a href="/link-i-want/to-get.html">
    <li class="cat-list-row1 clearfix">
    <img align="left" alt="Do not need!" src="https://do.not/need/.jpg" style="margin-right: 20px;" width="40%"/>
    <h3>
    <p class="subline">Do not need</p>  Do not need!                </h3>
    <span class="tag-body">
    <p>Do not need</p>...               </span>
    <div style="clear:both;"></div>
    </li>
    </a>    <a href="/link-i-want/to-get.html">
    <li class="cat-list-row1 clearfix">
    <img align="left" alt="Do not need!" src="https://do.not/need/.jpg" style="margin-right: 20px;" width="40%"/>
    <h3>
    <p class="subline">Do not need</p>  Do not need!                </h3>
    <span class="tag-body">
    <p>Do not need</p>...               </span>
    <div style="clear:both;"></div>
    </li>
    </a>
'''    
soup = BeautifulSoup(data, 'html.parser')
url_address = soup.find('a')['href']
print (url_address) # Output: /link-i-want/to-get.html

格式如下。 soup.find('<tag>')['<attribute-name>']。

我们可以使用提到的.get(attr)。 soup.find('<tag>').get('<attr>')

参考：https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start

如何仅从<a href,="" which="" includes="" li="" elements,="" using="" beautifulsoup?=""

1 个答案: