我已完成以下操作:
for url in url_list:
print (url)
#concat URL with http// and send it
if url[:4] != 'http':
url = 'http://' + url
response = requests.get(url)
response
soup = BeautifulSoup(response.text, "html.parser")
print(soup)
with open("copy.txt", "w") as file:
file.write(str(soup))
我想为每个抓取的URL创建一个文本文件。目前,它将所有内容保存到一个文件中。
答案 0 :(得分:1)
每次在for
循环中使用不同的名称打开文件。
id = 0
for url in url_list:
print (url)
#concat URL with http// and send it
if url[:4] != 'http':
url = 'http://' + url
response = requests.get(url)
response
soup = BeautifulSoup(response.text, "html.parser")
print(soup)
# Save to "copy_1.txt", "copy_2.txt", etc
id += 1
with open(f"copy_{id}.txt", "w") as file:
file.write(str(soup))
答案 1 :(得分:0)
您快到了。您只需要在for循环中创建文件,然后在文件名中添加一个标识符,这样就不会被覆盖。
for url in url_list:
# print (url)
# concat URL with http// and send it
if url[:4] != 'http':
url = 'http://' + url
response = requests.get(url)
response
soup = BeautifulSoup(response.text, "html.parser")
# print(soup)
# I would use an identifier from the url however you can use an index instead.
url_identifier = url_list.index(url)
with open(f"copy_{url_identifier}.txt", "w") as file:
file.write(str(soup))