Question

我尝试在Python中使用wget从txt文件下载链接。我应该用什么来帮助我做到这一点？

我正在使用wget Python模块。

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html, 'html.parser')
body = soup.body
s = "https://google.com/"

for url in soup.find_all('a'):
  f = open("output.txt", "a")
  print(str(s), file=f, end = '')
  print(url.get('href'), file=f)
  f.close()

到目前为止，我只能创建文本文件，然后在命令提示符下使用wget.exe。我希望能够一步一步完成所有这些操作。

Answer 1

由于您已经在使用第三方requests库，只需使用它即可：

from os.path import basename

with open('output.txt') as urls:
    for url in urls:
        response = requests.get(url)
        filename = basename(url)
        with open(filename, 'wb') as output:
            output.write(repsonse.content)

此代码有许多假设：

URL的结尾必须是唯一名称，因为我们使用basename来创建下载文件的名称。例如basename('https://i.imgur.com/7ljexwX.gifv')给出'7ljexwX.gifv'
假定内容为二进制而非文本，我们以'wb'表示'write binary'的形式打开输出文件。
未检查response以确保没有错误
如果content很大，它将被加载到内存中，然后写入输出文件。这可能不是很有效。这个网站上有other questions个地址可以解决这个问题。
我也没有实际尝试运行此代码。

如何从文本列表中包含所有链接的1次运行中下载文件？

1 个答案: