我们可以使用文件列表并重命名目标文件吗?

时间:2016-11-01 18:17:36

标签: wget

我有 wget 命令:

sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt

-N 会检查是否有更新的文件要下载。

-r 会启用递归检索。

-nH 将禁用主机前缀目录的生成。

- cut-dirs = X 将避免生成主机的子目录。

- timeout = xxx 将会超时:)

- directory-prefix 会将文件存储在所需的直接文件中。

这很好用,没问题。

现在,转到问题

我们说我的files-to-download.txt有这些文件:

http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...

你可以看到问题:在第二次下载时,wget会看到我们已经有了picture-same-name.jpg,所以它不会下载第二个或以下任何一个同名的。我无法镜像目录结构,因为我需要将所有下载的文件放在同一目录中。我无法使用 -O 选项,因为它与 - N 冲突,我需要它。我曾尝试使用 -nd ,但似乎并不适合我。

所以,理想情况下,我需要能够:

a .- wget来自我现在的方式列表,保留我的参数。

b .- 将所有文件放在同一目录下,并且能够重命名每个文件。

有人对此有任何解决方案吗?

提前致谢。

1 个答案:

答案 0 :(得分:0)

我建议采用两种方法 -

  1. 使用" -nc"或者" - no-clobber"选项。从手册页 -
  2.   -nc
      --no-clobber
          If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc.  In certain >cases, the local file will be
          clobbered, or overwritten, upon repeated download.  In other >cases it will be preserved.
    
          When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
          being named file.1.  If that file is downloaded yet again, the >third copy will be named file.2, and so on.  (This is also the behavior >with -nd, even if -r or -p are in
          effect.)  When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file.  Therefore, ""no->clobber"" is actually a misnomer in
          this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
    
          When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old.  Adding -nc will prevent this
          behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
    
          When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
          size of the file.  -nc may not be specified at the same time as >-N.
    
          A combination with -O/--output-document is only accepted if the >given output file does not exist.
    
          Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
    

    从此手册页条目中可以看出,行为可能是不可预测/意外的。你需要看看它是否适合你。

    1. 另一种方法是使用bash脚本。我最习惯在* nix上使用bash,所以请原谅平台依赖。然而,逻辑是合理的,并且通过一些修改,您也可以将其用于其他平台/脚本。
    2. 伪代码bash脚本示例 -

      for i in `cat list-of-files-to-download.txt`;
      do
      wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
      done ;
      

      您可以修改脚本以将每个文件下载到临时文件,解析$ i以从URL获取文件名,检查文件是否存在于磁盘上,然后决定将临时文件重命名为名称你想要的。

      这可以更好地控制您的下载。