我是Guzzle的新手。我正在尝试从USGS(http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php)下载各种.csv文件。
有时,下载的文件内容带有一些额外字节,后跟“\ r \ n”。换句话说,有时候,行会以文件内容为前缀。
示例:0000AFEB \ r \ n
在我看来服务器不会返回此内容,所以也许它是由PHP / Guzzle添加的?
任何人都可以指出我正确的方向,如何找出这到底是什么? :d
下面我贴了终端的一些输出来说明问题。
由于StackOverflow链接发布限制,我已在此帖中用hxxp://替换了字符串http://!
我相信这是USGS的HTTP服务器的原始输出
me@localhost:~/Desktop$ curl -s hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv | hexdump -c -n 50
0000000 t i m e , l a t i t u d e , l o
0000010 n g i t u d e , d e p t h , m a
0000020 g , m a g T y p e , n s t , g a
0000030 p ,
0000032
这是一个交互式Laravel控制台,它下载与上面的curl命令相同的文件(但使用Guzzle)。这似乎按预期工作......
me@localhost:~/Desktop$ php artisan tinker
Psy Shell v0.7.2 (PHP 7.0.8-0ubuntu0.16.04.2 — cli) by Justin Hileman
>>> $client = new GuzzleHttp\Client();
=> GuzzleHttp\Client {#680}
>>> $url = 'hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv';
=> "hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv"
>>> $result = $client->request('GET', $url, ['save_to' => '/tmp/all_hour.csv']);
=> GuzzleHttp\Psr7\Response {#708}
# Dump the first 200 characters of the downloaded file
>>> dd(substr(file_get_contents('/tmp/all_hour.csv'), 0, 200));
"""
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource\n
2016-08-26T08:21:47.530Z,33.8725,-116.94
"""
在另一个(更大的).csv文件上尝试相同的操作。检查原始输出,看起来没问题......
me@localhost:~/Desktop$ curl -s hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv | hexdump -c -n 50
0000000 t i m e , l a t i t u d e , l o
0000010 n g i t u d e , d e p t h , m a
0000020 g , m a g T y p e , n s t , g a
0000030 p ,
0000032
尝试使用Guzzle下载相同的Laravel会话。此示例显示下载的文件内容前面有一个新行(字符串“0000AFEB \ r \ n”)。这是什么?它为什么存在?
me@localhost:~/Desktop$ php artisan tinker
Psy Shell v0.7.2 (PHP 7.0.8-0ubuntu0.16.04.2 — cli) by Justin Hileman
>>> $client = new GuzzleHttp\Client();
=> GuzzleHttp\Client {#680}
>>> $url = 'hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv';
=> "hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv"
>>> $result = $client->request('GET', $url, ['save_to' => '/tmp/all_day.csv']);
=> GuzzleHttp\Psr7\Response {#708}
>>> dd(substr(file_get_contents('/tmp/all_day.csv'), 0, 200));
"""
0000AFEB\r\n
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource\n
2016-08-26T08:29:05.500Z,33.49
"""
例如,当发生这种情况时,也会添加一些后缀行。
\n
00000000\n
\n
我确信这种行为有合理的答案/理由......但我真的找不到它!
以下是为上面使用的两个文件输出的服务器标头...
me@localhost:~/Desktop$ curl -I hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv
HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.35
Last-Modified: Fri, 26 Aug 2016 08:33:01 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: *
Access-Control-Allow-Headers: accept,origin,authorization,content-type
Content-Type: text/csv
Cache-Control: public, max-age=277
Expires: Fri, 26 Aug 2016 08:41:50 GMT
Date: Fri, 26 Aug 2016 08:37:13 GMT
Connection: keep-alive
me@localhost:~/Desktop$ curl -I hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv
HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.35
Last-Modified: Fri, 26 Aug 2016 08:32:32 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: *
Access-Control-Allow-Headers: accept,origin,authorization,content-type
Content-Type: text/csv
Cache-Control: public, max-age=90
Expires: Fri, 26 Aug 2016 08:38:59 GMT
Date: Fri, 26 Aug 2016 08:37:29 GMT
Connection: keep-alive