Question

我已完成以下操作：

for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)

with open("copy.txt", "w") as file:
    file.write(str(soup))

我想为每个抓取的URL创建一个文本文件。目前，它将所有内容保存到一个文件中。

Answer 1

每次在for循环中使用不同的名称打开文件。

id = 0
for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)
    # Save to "copy_1.txt", "copy_2.txt", etc
    id += 1
    with open(f"copy_{id}.txt", "w") as file:
        file.write(str(soup))

Answer 2

您快到了。您只需要在for循环中创建文件，然后在文件名中添加一个标识符，这样就不会被覆盖。

for url in url_list:
    # print (url)
    # concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    # print(soup)

    # I would use an identifier from the url however you can use an index instead.
    url_identifier = url_list.index(url)

    with open(f"copy_{url_identifier}.txt", "w") as file:
        file.write(str(soup))

创建一个文本文件以保存每个抓取的URL中的数据

2 个答案: