我想抓取姓名,网址,年份和国籍,以及当我尝试使用以下代码时
import requests
import csv
from bs4 import BeautifulSoup
import bs4
f = csv.writer(open('z_artist_names_assignment.csv', 'w'))
f.writerow(['N'])
pages = []
for i in range(1, 2):
url = 'https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ' + str(i) + '.htm'
pages.append(url)
for item in pages:
page = requests.get(item,timeout=10)
soup = BeautifulSoup(page.text, 'html.parser')
last_links = soup.find(class_='AlphaNav')
last_links.decompose()
artist_name_list = soup.find(class_='BodyText')
artist_name_list_items = artist_name_list.find_all('a')
nationality_list = soup.find(class_='BodyText')
nationality_list_items = nationality_list.find_all('td')
for artist_name in artist_name_list_items:
names = artist_name.contents[0]
links = 'https://web.archive.org' + artist_name.get('href')
for nationality in nationality_list_items:
nationality = nationality.contents[0]
print(nationality)
打印(国籍)退回 不仅内容,还包括名称和标签
<a href="/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11630">Zabaglia, Niccola</a>
Italian, 1664 - 1750
<a href="/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34202">Zaccone, Fabian</a>
American, 1910 - 1992
<a href="/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3475">Zadkine, Ossip</a>
French, 1890 - 1967
我只想要'Italian,1664-1750'或'Italian'或'1664-1750'。 如何使用目录方法获得这些值?
这是HTML
<tr valign="top"><td><a href="/web/20121007172955/http://www.nga.gov/cgi-bin/tsearch?artistid=3452">Zalce, Alfredo</a></td><td>Mexican, born 1908</td></tr>
答案 0 :(得分:0)
我认为最好找到包含艺术家信息的所有“ tr”元素,而不是“ td”。
下面是示例。希望对您有帮助!
<table mat-table [dataSource]="event.expenses" class="mat-elevation-z8">