for循环输出到单个列表

时间:2019-09-18 22:00:40

标签: python web-scraping beautifulsoup

我正试图提取一些包含其他链接的链接,并努力将for循环的输出放入单个列表,如下所述。

我的代码:

pages = ['https://pagetoscrape.com/?page=1',
         'https://pagetoscrape.com/?page=2',
         'https://pagetoscrape.com/?page=3'
        ]

for u in pages:
    response = requests.get(u)
    data = response.content
    soup = BeautifulSoup(data, 'lxml')
    for links in soup.find_all('div', class_='item-to-scrape'):
        link = links.a['href']

        print(link)

输出:

pagetoscrape.com/url1
pagetoscrape.com/url2
pagetoscrape.com/url3
pagetoscrape.com/url4
pagetoscrape.com/url5
pagetoscrape.com/url6
pagetoscrape.com/url7
...

我如何获得像这样的列表,以便以后将其用于类似于“页面”数组变量的其他操作(另一个循环,一个接一个地选择链接)?

['pagetoscrape.com/url1', 'pagetoscrape.com/url2', 'pagetoscrape.com/url3', 'pagetoscrape.com/url4', ...]

2 个答案:

答案 0 :(得分:1)

通过列表理解来构建列表:

link_list = [links.a['href']
                for links in soup.find_all('div', class_='item-to-scrape') ] 

答案 1 :(得分:0)

如果我理解正确,那么您想要这样的东西。

pages = ['https://pagetoscrape.com/?page=1',
         'https://pagetoscrape.com/?page=2',
         'https://pagetoscrape.com/?page=3'
        ]

urls = []

for u in pages:
    response = requests.get(u)
    data = response.content
    soup = BeautifulSoup(data, 'lxml')
    for links in soup.find_all('div', class_='item-to-scrape'):
        link = links.a['href']
        urls.append(link)
        print(link)