Question

我正在尝试用Python下载大量文本，并希望将其全部保存到单个文件中。

我当前使用的代码为每个URL创建一个单独的文件。它遍历URL档案，请求数据，然后将其保存到自己的文件中。

filename = archive[i]
urllib.request.urlretrieve(url, path + filename + ".pgn")

我尝试为每个网址使用相同的文件名，但它只会覆盖文件。

有没有一种方法可以遍历归档文件，而不是将数据保存在其自己的单独文件中，而是将每个文本块添加到单个文件中？还是我只需要循环浏览所有文件，然后将它们串联在一起？

Answer 1

Python的urlretrive文档说

如果您希望通过URL检索资源并将其存储在一个临时位置，则可以通过urlretrieve（）函数来实现

因此，如果您希望将检索到的数据附加到一个文件中，则可以使用urlopen

像this一样：

import urllib.request

filename = "MY_FILE_PATH"
#-----------inside your i loop-------------
with urllib.request.urlopen(url) as response:
    data = response.read()
    # change your file type according e.g. "ab" for binary file
    with open(filename + ".pgn", "a+") as fp: fp.write(str(data))

Answer 2

请注意，urlretrieve might become deprecated在将来的某个时候。因此，请改用urlopen。

import urllib.request
import shutil

...

filename = archive[i]
with urllib.request.urlopen(url) as response, open(filename, 'ab') as out_file:  
    shutil.copyfileobj(response, out_file)

如何使urllib.request追加到现有文件？

2 个答案: