golang下载文件而不是html页面

时间:2015-07-25 20:38:50

标签: go

我想从此网址下载pgn文本文件:http://www.chess.com/echess/download_pgn?lid=1222621131。 我有以下(编辑)代码应该这样做,但它正在下载一个html页面。我可能做错了什么?

package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "os"
)

func main() {
    url := "http://www.chess.com/echess/download_pgn?lid=1222621131"
    filename := "game.pgn"
    resp, err := http.Get(url)
    ...

    file, err := os.Create(filename)
    defer file.Close()

    ...

    size, err := io.Copy(file, resp.Body)   
}

1 个答案:

答案 0 :(得分:2)

首先猜测的是,您无法提供浏览器会话通常提供的所有正常身份验证,Cookie和标头。作为实验,在隐身模式下打开Chrome,然后打开您的开发人员工具,现在在该窗口中点击您上面的GET。当我这样做时,我会在Chrome的“网络”标签中查看第一个GET。请注意下面的请求和响应详细信息。注意302的响应代码,这意味着它被找到,但是你被重定向。现在来看看Location标头。它读取' / login'。我怀疑这是您的代码正在下载的页面,因为您的Go程序没有像您的浏览器那样拥有此站点的登录会话/ cookie。

我们的浏览器为浏览网站做了大量工作。从头开始编码可能有点工作。您必须注意cookie,身份验证,标头,重定向等。

Remote Address:174.35.7.172:80
Request URL:http://www.chess.com/echess/download_pgn?lid=1222621131
Request Method:GET
Status Code:302 Found
Response Headers
view parsed
HTTP/1.1 302 Found
Date: Sat, 25 Jul 2015 20:49:43 GMT
Server: PWS/8.1.20.22
X-Px: ms h0-s1027.p12-sjc ( origin)
P3P: CP="ALL DSP COR LAW CURa ADMa DEVa TAIa OUR BUS IND ONL UNI COM NAV DEM CNT"
Cache-Control: private
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Content-Length: 0
Content-Type: text/html; charset=utf-8
Location: /login
Connection: keep-alive
Set-Cookie: PHPSESSID=pach18her77q4asgsq2heohvj1; path=/; domain=.chess.com; HttpOnly
Request Headers
view parsed
GET /echess/download_pgn?lid=1222621131 HTTP/1.1
Host: www.chess.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,es;q=0.6
Query String Parameters
view source
view URL encoded
lid:1222621131