创建一个文本文件以保存每个抓取的URL中的数据

时间:2019-12-30 20:04:26

标签: python web-scraping

我已完成以下操作:

for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)

with open("copy.txt", "w") as file:
    file.write(str(soup))

我想为每个抓取的URL创建一个文本文件。目前,它将所有内容保存到一个文件中。

2 个答案:

答案 0 :(得分:1)

每次在for循环中使用不同的名称打开文件。

id = 0
for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)
    # Save to "copy_1.txt", "copy_2.txt", etc
    id += 1
    with open(f"copy_{id}.txt", "w") as file:
        file.write(str(soup))

答案 1 :(得分:0)

您快到了。您只需要在for循环中创建文件,然后在文件名中添加一个标识符,这样就不会被覆盖。

for url in url_list:
    # print (url)
    # concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    # print(soup)

    # I would use an identifier from the url however you can use an index instead.
    url_identifier = url_list.index(url)

    with open(f"copy_{url_identifier}.txt", "w") as file:
        file.write(str(soup))