我有 wget 命令:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N 会检查是否有更新的文件要下载。
-r 会启用递归检索。
-nH 将禁用主机前缀目录的生成。
- cut-dirs = X 将避免生成主机的子目录。
- timeout = xxx 将会超时:)
- directory-prefix 会将文件存储在所需的直接文件中。
这很好用,没问题。
现在,转到问题:
我们说我的files-to-download.txt有这些文件:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
你可以看到问题:在第二次下载时,wget会看到我们已经有了picture-same-name.jpg,所以它不会下载第二个或以下任何一个同名的。我无法镜像目录结构,因为我需要将所有下载的文件放在同一目录中。我无法使用 -O 选项,因为它与 - N 冲突,我需要它。我曾尝试使用 -nd ,但似乎并不适合我。
所以,理想情况下,我需要能够:
a .- wget来自我现在的方式列表,保留我的参数。
b .- 将所有文件放在同一目录下,并且能够重命名每个文件。
有人对此有任何解决方案吗?
提前致谢。
答案 0 :(得分:0)
我建议采用两种方法 -
-nc --no-clobber If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be clobbered, or overwritten, upon repeated download. In other >cases it will be preserved. When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented. When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored. When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and size of the file. -nc may not be specified at the same time as >-N. A combination with -O/--output-document is only accepted if the >given output file does not exist. Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
从此手册页条目中可以看出,行为可能是不可预测/意外的。你需要看看它是否适合你。
伪代码bash脚本示例 -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
您可以修改脚本以将每个文件下载到临时文件,解析$ i以从URL获取文件名,检查文件是否存在于磁盘上,然后决定将临时文件重命名为名称你想要的。
这可以更好地控制您的下载。