定位<a> with specific attribute using BeautifulSoup

时间:2015-11-25 21:45:38

标签: python web-scraping beautifulsoup

I'm attempting to scrape a page that has a section like this:

<a name="id_631"></a>

<hr>

<div class="store-class">
    <div>
        <span><strong>Store City</strong</span>
    </div>

    <div class="store-class-content">
        <p>Event listing</p>
        <p>Event listing2</p>
        <p>Event listing3</p>
    </div>

    <div>
        Stuff about contact info
    </div>
</div>

The page is a list of sections like that and the only way to differentiate them is by the name attribute in the <a> tag.

So I'm thinking I want to target that then go to the next_sibling to get the <hr> then again to the next sibling to get the <div class="store-class"> section. All I want is the info in that div tag.

I'm not sure how to target that <a> tag to move down two siblings though. When I try print(soup.find_all('a', {"name":"id_631"})) that just gives me what's in the tag, which is nothing.

Here's my script:

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.tandyleather.com/en/leathercraft-classes")

soup = soup = BeautifulSoup(r.text, 'html.parser')

print(soup.find("a", id="id_631").find_next_sibling("div", class_="store-class"))

But I get the error:

Traceback (most recent call last):
File "tandy.py", line 8, in <module>
print(soup.find("a", id="id_631").find_next_sibling("div", class_="store-class"))
AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

1 个答案:

答案 0 :(得分:5)

find_next_sibling()救援:

soup.find("a", attrs={"name": "id_631"}).find_next_sibling("div", class_="store-class")

此外,html.parser必须替换为lxmlhtml5lib

另见: