使用Wget下载单个文件而不是通过清单下载所有文件时出错

时间:2018-08-11 00:24:15

标签: wget

使用Wget下载单个文件而不是通过清单下载所有文件时出现错误。

我要按照此网站上的说明下载文件

https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/index.html

我使用了他们给出的命令

wget -i https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/manifest.txt

但是,我只想下载某些文件,而不是清单中的每个文件。我正在查看清单文件,其内容如下所示:

  

corpus-2018-05-03 / s2-corpus-00.gz

     

corpus-2018-05-03 / s2-corpus-01.gz

     

corpus-2018-05-03 / s2-corpus-02.gz

     

corpus-2018-05-03 / s2-corpus-03.gz

     

corpus-2018-05-03 / s2-corpus-04.gz

     

corpus-2018-05-03 / s2-corpus-05.gz

所以我只是将命令更改为这样

wget -i https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/s2-corpus-02.gz 

该命令起初运行良好,但是在将文件卸载后,我得到了一些警告和/或错误消息。我不确定它们是什么意思。这是输出

--2018-08-11 00:03:47--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/s2-corpus-02.gz
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.128.152
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 996588773 (950M) [application/x-gzip]
Saving to: ‘s2-corpus-02.gz’

s2-corpus-02.gz     100%[===================>] 950.42M  38.5MB/s    in 25s     

2018-08-11 00:04:13 (37.5 MB/s) - ‘s2-corpus-02.gz’ saved [996588773/996588773]

�7�sa����=���xT���~��%����3X�M�|~�X^Z%\�?�`��Fx?�%��\���/�5/�$��P����g+�v�j: Bad port number
s2-corpus-02.gz: Invalid URL https://�*�b�:ۅF�Cg��$�Bj�H�gLM逖N�l���ZUV�[�;&mu�̸��&�y��X�%��;�˝1|)�$�d˝�: Bad port number
s2-corpus-02.gz: Invalid URL https://{Y1��&�������\�Y�Ey�Զ�:E3;ɜ Q!: Bad port number
n]��g: Invalid host name
%��)]kZ�R�e����� Ӡ�{)]��B��0��OV�%T��: Invalid host name
s2-corpus-02.gz: Invalid URL https://7�s�s����{���!ސ@: Invalid host name
s2-corpus-02.gz: Invalid URL https://���ݔ�v�G7NI:,J�����i�YKN�o�.e�N�z< R�  DZ$+4;!C�B���ZJ"�>��2�@`ǼU3��x��D�   bqh���: Bad port number
�5�3���݂5�LLT�]���j0)dv7:2�]�x���a���fv�#��$=!Y�ږ�9U �@H*�Ǹ: Bad port number
uc;�]*�m������:����o4Z�`c�#,U��ze"vrY;,!̝rF���aL�L��7�Ն-�zs�w;Zu\^����e��H��m��{ʪ*��l���O: Bad port number
s2-corpus-02.gz: Invalid URL https://�:�D����: Bad port number
ٶ����1�>g�y���=͛����hv���O�b�o��m���i��&��w��/���{�k|   �Q(zq��ϔ���: Bad port number
���^盩Y��'DIfe*��&��ƫO�|�80��湏��~9: Invalid host name
^zs��멨�u�o\?��#`x����{�>�˝�d��CI�C��4Fg������9j?�w�(X�N���7: Bad port number
s2-corpus-02.gz: Invalid URL https://��j�q(�Ur��1�KMq�1]��@d�aԌ����:�3�pEzbaj(��B��*}kK��ΊOu;B��V: Bad port number
s2-corpus-02.gz: Invalid URL https://�����`m���<�5��!;p3���~�`�)�Q���0�:!�n��`�r���D0ǖ�&r'�*.i�!��mM����n�oڀ�Zk�l�H1���t�: Bad port number
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%1F%8B%08
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%1F%8B%08
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%C9E%DF%C4$%0C.eL%7B%93%82%F1J%04%C3m%14%8Dl%9Ckk%AB%1B%7C%9B%B4%17%A26m
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%C9E%DF%C4$%0C.eL%7B%93%82%F1J%04%C3m%14%8Dl%9Ckk%AB%1B%7C%9B%B4%17%A26m
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Warning: wildcards not supported in HTTP.
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%D4%9F%B5%F8%C3j%86%86%DEm6%CB%F5%EF%CE%CF%D7qn1n~%ED%EF%FA%99]%9D%F5%AB%DB%F3%A5]%C6%D5%B9sF4%A2%B52%A5%E8%99%16%3Ey%E3%92%16%9C%7B%CB%A2%60%C2%0B%99l%AD%9E%D0C%AFB*%CF%C5%A7%3C%10_q%B7%DDn%EE%FA%15%8D%CF??Y%D8%3C%CA%DFn1]%F7%DB%EA*v%F9%81y%F0j&j%D90%F3%E4%1F%FF%F3%C9%EE%CA%AB%B3%8B%B3%EAzio%17%FD%AC%DF_+Ykpu%7Dp%ED7go%CE%AA%9B8_b%96'%97
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Warning: wildcards not supported in HTTP.
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%D4%9F%B5%F8%C3j%86%86%DEm6%CB%F5%EF%CE%CF%D7qn1n~%ED%EF%FA%99]%9D%F5%AB%DB%F3%A5]%C6%D5%B9sF4%A2%B52%A5%E8%99%16%3Ey%E3%92%16%9C%7B%CB%A2%60%C2%0B%99l%AD%9E%D0C%AFB*%CF%C5%A7%3C%10_q%B7%DDn%EE%FA%15%8D%CF??Y%D8%3C%CA%DFn1]%F7%DB%EA*v%F9%81y%F0j&j%D90%F3%E4%1F%FF%F3%C9%EE%CA%AB%B3%8B%B3%EAzio%17%FD%AC%DF_+Ykpu%7Dp%ED7go%CE%AA%9B8_b%96'%97
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Warning: wildcards not supported in HTTP.
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%D6H%95/%DD%CF%F7%BBr%C7%DB%D7o.%DF%BF%ADh2%AB%D3%F2%CF%EB%97/_Vo%BB%19%A6uu_]%F6%F3%F9v%D1%F92%C5%F8%B8Hq%15%17%3E%92H%D00%5E%B8%F5fe%FD%06%0F%7B%F9y%13%17%EB%7C]%EAW%D5%FB%EB%ABo%AAnQ%D9j%DE%BBn%16+%1BN%EF%20%B6q%F1%B1[%F5%8B9%C4%B9%BA%B3%1Fc%E5b%5CT!~%8C%B3~%19C%E5%EE%AB%CD],%B7%BF~y%F3M%F5%A9_%FDHb%7B%BB%EA%B7Kt%F0.%AEc%15%F7/%B3+%7C%9C%C7%D5-]d%D7U%A4)%D9tx%F2%AA,%84j=.%83%DC%B0%0D%9A%8B%1E%CD%AA%18nc%B5%88%1Bz%C1%FA%ACz%D5%7FB
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

Warning: wildcards not supported in HTTP.
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/%D6H%95/%DD%CF%F7%BBr%C7%DB%D7o.%DF%BF%ADh2%AB%D3%F2%CF%EB%97/_Vo%BB%19%A6uu_]%F6%F3%F9v%D1%F92%C5%F8%B8Hq%15%17%3E%92H%D00%5E%B8%F5fe%FD%06%0F%7B%F9y%13%17%EB%7C]%EAW%D5%FB%EB%ABo%AAnQ%D9j%DE%BBn%16+%1BN%EF%20%B6q%F1%B1[%F5%8B9%C4%B9%BA%B3%1Fc%E5b%5CT!~%8C%B3~%19C%E5%EE%AB%CD],%B7%BF~y%F3M%F5%A9_%FDHb%7B%BB%EA%B7Kt%F0.%AEc%15%F7/%B3+%7C%9C%C7%D5-]d%D7U%A4)%D9tx%F2%AA,%84j=.%83%DC%B0%0D%9A%8B%1E%CD%AA%18nc%B5%88%1Bz%C1%FA%ACz%D5%7FB
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

The name is too long, 243 chars total.
Trying to shorten...
New name is R�է.�%10�����4��M?%C7%90%F9I%97%E7%D1%DF%D9E%B7%9E%9FT%DDY%3C;%A9%5E__%DC%5C%5CUO1%16+%7B%BA%EE%D0%A8%87%F7%8D%13:5%07%CF%CE%AA?F%BCj%8E%0EmrW%F2%A4%F6%D36%7BH%16%FE%FC%88f%A1%F1%D0%0C%A9C%AB%1E%AE%B3%3CAG&%7B%98%91%2F%0C%CE%FF?)%FF%DF.
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/R%BF%D5%A7.%90%10%85%B8%9C%F5%F74%B1%DBM?%C7%90%F9I%97%E7%D1%DF%D9E%B7%9E%9FT%DDY%3C;%A9%5E__%DC%5C%5CUO1%16+%7B%BA%EE%D0%A8%87%F7%8D%13:5%07%CF%CE%AA?F%BCj%8E%0EmrW%F2%A4%F6%D36%7BH%16%FE%FC%88f%A1%F1%D0%0C%A9C%AB%1E%AE%B3%3CAG&%7B%98%91/%0C%CE%FF?)%FF%DF%9B%142
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

The name is too long, 243 chars total.
Trying to shorten...
New name is R�է.�%10�����4��M?%C7%90%F9I%97%E7%D1%DF%D9E%B7%9E%9FT%DDY%3C;%A9%5E__%DC%5C%5CUO1%16+%7B%BA%EE%D0%A8%87%F7%8D%13:5%07%CF%CE%AA?F%BCj%8E%0EmrW%F2%A4%F6%D36%7BH%16%FE%FC%88f%A1%F1%D0%0C%A9C%AB%1E%AE%B3%3CAG&%7B%98%91%2F%0C%CE%FF?)%FF%DF.
Incomplete or invalid multibyte sequence encountered
--2018-08-11 00:04:20--  https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/R%BF%D5%A7.%90%10%85%B8%9C%F5%F74%B1%DBM?%C7%90%F9I%97%E7%D1%DF%D9E%B7%9E%9FT%DDY%3C;%A9%5E__%DC%5C%5CUO1%16+%7B%BA%EE%D0%A8%87%F7%8D%13:5%07%CF%CE%AA?F%BCj%8E%0EmrW%F2%A4%F6%D36%7BH%16%FE%FC%88f%A1%F1%D0%0C%A9C%AB%1E%AE%B3%3CAG&%7B%98%91/%0C%CE%FF?)%FF%DF%9B%142
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.128.152|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2018-08-11 00:04:20 ERROR 400: Bad Request.

这只是输出的一小部分。它会继续运行,直到我手动停止执行为止,它似乎并没有完成运行。

1 个答案:

答案 0 :(得分:0)

这只是wget的一个严重错误。在手册页上,

  

-i 文件,--input-file = 文件(从本地或外部文件读取URL。)

因此,所使用的命令尝试从https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/corpus-2018-05-03/s2-corpus-02.gz的二进制内容中解析URL,并“获取”这些URL。无效的URL(来自二进制内容)只会导致更多错误。

正确而简单的解决方案是在与manifest.txt一起使用之前修改wget -i的内容。