Question

我正在制作一个网络刮刀，现在我已经拥有它，所以它抓住了一个网址列表。我需要它使用列表中的每个url，它一次一个地进入汤功能，从每个页面获得我想要的html输出。

示例：

my_list = ['www.google1213.com', 'www.yahoo123.com', 'www.apples123.com']

def main():

    url = input('URL: ') #List goes here
    currentDT = datetime.datetime.now() 
    scraper = cfscrape.create_scraper() 
    response = scraper.get(url).content
    soup = BeautifulSoup(response,"lxml")
    #etc...#

while True:
main()

如果有人可以帮助我将我的列表发送给我的内容，那么我每次都会抓一个网址，我会非常感激！

Answer 1

def main():
    for url in my_list:
        currentDT = datetime.now()
        scraper = cfscrape.create_scraper()
        response = scraper.get(url).content
        soup = BeautifulSoup(response,"lxml")

Answer 2

您可以使用简单的for循环：

for url in my_list:
    print(url)
    # do your scrapping stuff...

Ps：也许你也应该每秒限制你的请求。否则，一些网站会在几次尝试后阻止你。

Python / BS4 - 在功能中使用列表作为输入

2 个答案: