Guzzle是否在我下载的text / csv文件中添加了行?

时间:2016-08-26 09:21:45

标签: php laravel csv guzzle

我是Guzzle的新手。我正在尝试从USGS(http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php)下载各种.csv文件。

有时,下载的文件内容带有一些额外字节,后跟“\ r \ n”。换句话说,有时候,行会以文件内容为前缀。

示例:0000AFEB \ r \ n

在我看来服务器不会返回此内容,所以也许它是由PHP / Guzzle添加的?

任何人都可以指出我正确的方向,如何找出这到底是什么? :d

下面我贴了终端的一些输出来说明问题。

由于StackOverflow链接发布限制,我已在此帖中用hxxp://替换了字符串http://!

我相信这是USGS的HTTP服务器的原始输出

me@localhost:~/Desktop$ curl -s hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv | hexdump -c -n 50
0000000   t   i   m   e   ,   l   a   t   i   t   u   d   e   ,   l   o
0000010   n   g   i   t   u   d   e   ,   d   e   p   t   h   ,   m   a
0000020   g   ,   m   a   g   T   y   p   e   ,   n   s   t   ,   g   a
0000030   p   ,                                                        
0000032

这是一个交互式Laravel控制台,它下载与上面的curl命令相同的文件(但使用Guzzle)。这似乎按预期工作......

me@localhost:~/Desktop$ php artisan tinker
Psy Shell v0.7.2 (PHP 7.0.8-0ubuntu0.16.04.2 — cli) by Justin Hileman

>>> $client = new GuzzleHttp\Client();
=> GuzzleHttp\Client {#680}

>>> $url = 'hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv';
=> "hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv"

>>> $result = $client->request('GET', $url, ['save_to' => '/tmp/all_hour.csv']);
=> GuzzleHttp\Psr7\Response {#708}

# Dump the first 200 characters of the downloaded file
>>> dd(substr(file_get_contents('/tmp/all_hour.csv'), 0, 200));
"""
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource\n
2016-08-26T08:21:47.530Z,33.8725,-116.94
"""

在另一个(更大的).csv文件上尝试相同的操作。检查原始输出,看起来没问题......

me@localhost:~/Desktop$ curl -s hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv | hexdump -c -n 50
0000000   t   i   m   e   ,   l   a   t   i   t   u   d   e   ,   l   o
0000010   n   g   i   t   u   d   e   ,   d   e   p   t   h   ,   m   a
0000020   g   ,   m   a   g   T   y   p   e   ,   n   s   t   ,   g   a
0000030   p   ,                                                        
0000032

尝试使用Guzzle下载相同的Laravel会话。此示例显示下载的文件内容前面有一个新行(字符串“0000AFEB \ r \ n”)。这是什么?它为什么存在?

me@localhost:~/Desktop$ php artisan tinker
Psy Shell v0.7.2 (PHP 7.0.8-0ubuntu0.16.04.2 — cli) by Justin Hileman

>>> $client = new GuzzleHttp\Client();
=> GuzzleHttp\Client {#680}

>>> $url = 'hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv';
=> "hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv"

>>> $result = $client->request('GET', $url, ['save_to' => '/tmp/all_day.csv']);
=> GuzzleHttp\Psr7\Response {#708}

>>> dd(substr(file_get_contents('/tmp/all_day.csv'), 0, 200));
"""
0000AFEB\r\n
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource\n
2016-08-26T08:29:05.500Z,33.49
"""

例如,当发生这种情况时,也会添加一些后缀行。

\n
00000000\n
\n

我确信这种行为有合理的答案/理由......但我真的找不到它!

以下是为上面使用的两个文件输出的服务器标头...

me@localhost:~/Desktop$ curl -I hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv
HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.35
Last-Modified: Fri, 26 Aug 2016 08:33:01 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: *
Access-Control-Allow-Headers: accept,origin,authorization,content-type
Content-Type: text/csv
Cache-Control: public, max-age=277
Expires: Fri, 26 Aug 2016 08:41:50 GMT
Date: Fri, 26 Aug 2016 08:37:13 GMT
Connection: keep-alive

me@localhost:~/Desktop$ curl -I hxxp://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv
HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.35
Last-Modified: Fri, 26 Aug 2016 08:32:32 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: *
Access-Control-Allow-Headers: accept,origin,authorization,content-type
Content-Type: text/csv
Cache-Control: public, max-age=90
Expires: Fri, 26 Aug 2016 08:38:59 GMT
Date: Fri, 26 Aug 2016 08:37:29 GMT
Connection: keep-alive

0 个答案:

没有答案