Question

我正试图通过python中的BeautifulSoup刮取this页面的最后一部分。

我想检索底部列出的所有公司。此外，这些公司按字母顺序排序，其中标题以“A-F”开头的公司出现在第一个标签下，然后是第二个标签下的“G-N”，依此类推。您必须单击要显示的名称的选项卡，因此我将遍历不同的“名称页面”并应用相同的代码。

但是，我无法检索单个页面的所有名称。在查看名为“A-F”的公司时，我只能检索表格第一列的名称。

我的代码是：

from bs4 import BeautifulSoup as Soup
import requests

incl_page_url = "https://www.triodos.com/en/investment-management/socially-
responsible-investment/sustainable-investment-universe/companies-atmf1/"
page = requests.get(incl_page_url)
soup = Soup(page.content, "html.parser")

for header in soup.find("h2").next_siblings:
    try:
        for a in header.childGenerator():
            if str(type(a)) == "<class 'bs4.element.NavigableString'>":
                print(str(a))
    except:
        pass

通过运行它可以看出，我只从第一列中获取名称。非常感谢任何帮助。

Answer 1

试一试并告诉我这不是你想要的：

from bs4 import BeautifulSoup
import requests

incl_page_url = "https://www.triodos.com/en/investment-management/socially-responsible-investment/sustainable-investment-universe/companies-atmf1/"
page = requests.get(incl_page_url).text
soup = BeautifulSoup(page, "lxml")
for items in soup.select(".splitColumn p"):
    title = '\n'.join([item for item in items.strings])
    print(title)

结果：

3iGroup
8point3 Energy Partners  
A
ABN AMRO
Accell Group
Accsys Technologies
Achmea
Acuity Brands
Adecco
Adidas
Adobe Systems

在python中使用BS刮擦页面仅捕获splitColumn的第一列

1 个答案: