转义字符在wget --content-disposition Filenaming上

时间:2015-10-16 19:50:47

标签: windows bash shell sh wget

关于内容处理有很多问题,但没有一个与我遇到的问题相符。我希望有人能帮我解决。

所以,我想用wget下载大量文件。我使用--content-disposition参数来获得良好的文件格式。但遗憾的是,当文件名包含一些特殊字符时,例如\|/:?",{{1 }},*<,文件下载已转义。

可以说,我要下载的文件的文件名为商业内幕:如何启动您的商务。您可以注意到文件名具有>的特殊字符,当我运行脚本时,wget确实下载了文件,但文件名仅返回 Bussiness Insider ,大​​小为零且没有任何文件延期。

我尝试了:以及其他类似--restrict-file-names=windows的基本名称,但仍然没有运气。

这是脚本:

-O

1 个答案:

答案 0 :(得分:0)

首先尝试 - restrict-file-names = nocontrol

如果这不起作用,那么对我来说这很有用: - restrict-file-names = unix (因为我在Linux机器上或在Windows中使用BASH / Cygwin)。

您可能需要--restrict-file-names = windows

如果你注意到,它现在会下载带有特殊字符的文件名。

       By default, Wget escapes the characters that are not valid or safe as part of file names on your operating system, as well as control characters that are typically unprintable.  This
       option is useful for changing these defaults, perhaps because you are downloading to a non-native partition, or because you want to disable escaping of the control characters, or you
       want to further restrict characters to only those in the ASCII range of values.

       The modes are a comma-separated set of text values. The acceptable values are unix, windows, nocontrol, ascii, lowercase, and uppercase. The values unix and windows are mutually
       exclusive (one will override the other), as are lowercase and uppercase. Those last are special cases, as they do not change the set of characters that would be escaped, but rather
       force local file paths to be converted either to lower- or uppercase.

       When "unix" is specified, Wget escapes the character / and the control characters in the ranges 0--31 and 128--159.  This is the default on Unix-like operating systems.

       When "windows" is given, Wget escapes the characters \, |, /, :, ?, ", *, <, >, and the control characters in the ranges 0--31 and 128--159.  In addition to this, Wget in Windows
       mode uses + instead of : to separate host and port in local file names, and uses @ instead of ? to separate the query portion of the file name from the rest.  Therefore, a URL that
       would be saved as www.xemacs.org:4300/search.pl?input=blah in Unix mode would be saved as www.xemacs.org+4300/search.pl@input=blah in Windows mode.  This mode is the default on
       Windows.

       **If you specify nocontrol, then the escaping of the control characters is also switched off. This option may make sense when you are downloading URLs whose names contain UTF-8
       characters, on a system which can save and display filenames in UTF-8 (some possible byte values used in UTF-8 byte sequences fall in the range of values designated by Wget as
       "controls").**

       The ascii mode is used to specify that any bytes whose values are outside the range of ASCII characters (that is, greater than 127) shall be escaped. This can be useful when saving
       filenames whose encoding does not match the one used locally.

wget的人在此选项上显示为: 的 - 限制文件 - 名称 = 模式            更改在生成本地文件名期间必须转义在远程URL中找到的字符。受此选项限制的字符将被转义,即替换为%HH,            其中HH是对应于受限字符的十六进制数。此选项还可用于强制所有按字母顺序排列的案例为小写或大写。

{{1}}