Question

我最近开始使用BeautifulSoup进行网页抓取。我试图从arts website国家画廊的第一页中提取所有艺术家的名字。

这是我的代码

import requests

from bs4 import BeautifulSoup

data=requests.get('https://www.nga.gov/Collection/artists.html?pageNumber=1')

soup=BeautifulSoup(data.content,'html.parser')

soup.find_all('a')

当我这样做时，我得到页面中的所有链接，但包含艺术家姓名的链接除外。

例如，对于艺术家“希腊A”工厂，这是在Chrome“"Greek A" Factory”中使用检查选项后找到的代码但是在我创造的汤对象中找不到任何东西。你能让我知道我在这里犯了什么错误吗？

Answer 1

试试这个：

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
driver.get('https://www.nga.gov/Collection/artists.html?pageNumber=1')
time.sleep(5)
soup = BeautifulSoup(driver.page_source,'lxml')
driver.quit()

for artist_name in soup.select('.title a'):
    print(artist_name.text)

部分结果：

"Greek A" Factory
2 Bit Comics
7 Freds Press
A. B.
Aachen, Hans von
Aarland, Johann Carl Wilhelm
Abakanowicz, Magdalena

尝试使用BeautifulSoup提取所有艺术家的名字

1 个答案: