Question

我正试图提取一些包含其他链接的链接，并努力将for循环的输出放入单个列表，如下所述。

我的代码：

pages = ['https://pagetoscrape.com/?page=1',
         'https://pagetoscrape.com/?page=2',
         'https://pagetoscrape.com/?page=3'
        ]

for u in pages:
    response = requests.get(u)
    data = response.content
    soup = BeautifulSoup(data, 'lxml')
    for links in soup.find_all('div', class_='item-to-scrape'):
        link = links.a['href']

        print(link)

输出：

pagetoscrape.com/url1
pagetoscrape.com/url2
pagetoscrape.com/url3
pagetoscrape.com/url4
pagetoscrape.com/url5
pagetoscrape.com/url6
pagetoscrape.com/url7
...

我如何获得像这样的列表，以便以后将其用于类似于“页面”数组变量的其他操作（另一个循环，一个接一个地选择链接）？

['pagetoscrape.com/url1', 'pagetoscrape.com/url2', 'pagetoscrape.com/url3', 'pagetoscrape.com/url4', ...]

Answer 1

通过列表理解来构建列表：

link_list = [links.a['href']
                for links in soup.find_all('div', class_='item-to-scrape') ]

Answer 2

如果我理解正确，那么您想要这样的东西。

pages = ['https://pagetoscrape.com/?page=1',
         'https://pagetoscrape.com/?page=2',
         'https://pagetoscrape.com/?page=3'
        ]

urls = []

for u in pages:
    response = requests.get(u)
    data = response.content
    soup = BeautifulSoup(data, 'lxml')
    for links in soup.find_all('div', class_='item-to-scrape'):
        link = links.a['href']
        urls.append(link)
        print(link)

for循环输出到单个列表

2 个答案: