我正试图提取一些包含其他链接的链接,并努力将for循环的输出放入单个列表,如下所述。
我的代码:
pages = ['https://pagetoscrape.com/?page=1',
'https://pagetoscrape.com/?page=2',
'https://pagetoscrape.com/?page=3'
]
for u in pages:
response = requests.get(u)
data = response.content
soup = BeautifulSoup(data, 'lxml')
for links in soup.find_all('div', class_='item-to-scrape'):
link = links.a['href']
print(link)
输出:
pagetoscrape.com/url1
pagetoscrape.com/url2
pagetoscrape.com/url3
pagetoscrape.com/url4
pagetoscrape.com/url5
pagetoscrape.com/url6
pagetoscrape.com/url7
...
我如何获得像这样的列表,以便以后将其用于类似于“页面”数组变量的其他操作(另一个循环,一个接一个地选择链接)?
['pagetoscrape.com/url1', 'pagetoscrape.com/url2', 'pagetoscrape.com/url3', 'pagetoscrape.com/url4', ...]
答案 0 :(得分:1)
通过列表理解来构建列表:
link_list = [links.a['href']
for links in soup.find_all('div', class_='item-to-scrape') ]
答案 1 :(得分:0)
如果我理解正确,那么您想要这样的东西。
pages = ['https://pagetoscrape.com/?page=1',
'https://pagetoscrape.com/?page=2',
'https://pagetoscrape.com/?page=3'
]
urls = []
for u in pages:
response = requests.get(u)
data = response.content
soup = BeautifulSoup(data, 'lxml')
for links in soup.find_all('div', class_='item-to-scrape'):
link = links.a['href']
urls.append(link)
print(link)