Question

布局如下：

<div class="App">
    <div class="content">
        <div class="title">Application Name #1</div>
        <div class="image" style="background-image: url(https://img_url)">
        </div>
        <a href="http://app_url" class="signed button">install app</a>
    </div>
</div>

我正在尝试抓住TITLE，然后是APP_URL，理想情况下，当我通过html打印时，我希望TITLE成为APP_URL的超链接。

我的代码是这样的，但不会产生欲望结果。我相信我需要在循环中添加另一个命令来获取标题。唯一的问题是，我如何确保我抓住TITLE和APP_URL以便它们一起使用？至少有15个类<div class="App">的应用。当然，我也想要所有15个结果。

重要提示：对于href链接，我需要来自名为"signed button"的类。

soup = BeautifulSoup(example)
for div in soup.findAll('div', {'class': 'App'}):
    a = div.findAll('a')[1]
    print a.text.strip(), '=>', a.attrs['href']

Answer 1

使用CSS选择器：

from bs4 import BeautifulSoup

html = """
<div class="App">
    <div class="content">
        <div class="title">Application Name #1</div>
        <div class="image" style="background-image: url(https://img_url)">
        </div>
        <a href="http://app_url" class="signed button">install app</a>
    </div>
</div>"""

soup = BeautifulSoup(html, 'html5lib')

for div in soup.select('div.App'):
    title = div.select_one('div.title')
    link = div.select_one('a')

    print("Click here: <a href='{}'>{}</a>".format(link["href"], title.text))

哪个收益

Click here: <a href='http://app_url'>Application Name #1</a>

Answer 2

也许这样的事情会起作用吗？

soup = BeautifulSoup(example)
for div in soup.findAll('div', {'class': 'App'}):
    a = div.findAll('a')[0]
    print div.findAll('div', {'class': 'title'})[0].text, '=>', a.attrs['href']

beautifulsoup - 提取子div中的链接，文本和标题

2 个答案: