我想从此网址下载pgn文本文件:http://www.chess.com/echess/download_pgn?lid=1222621131。 我有以下(编辑)代码应该这样做,但它正在下载一个html页面。我可能做错了什么?
package main
import (
"fmt"
"io"
"log"
"net/http"
"os"
)
func main() {
url := "http://www.chess.com/echess/download_pgn?lid=1222621131"
filename := "game.pgn"
resp, err := http.Get(url)
...
file, err := os.Create(filename)
defer file.Close()
...
size, err := io.Copy(file, resp.Body)
}
答案 0 :(得分:2)
首先猜测的是,您无法提供浏览器会话通常提供的所有正常身份验证,Cookie和标头。作为实验,在隐身模式下打开Chrome,然后打开您的开发人员工具,现在在该窗口中点击您上面的GET。当我这样做时,我会在Chrome的“网络”标签中查看第一个GET。请注意下面的请求和响应详细信息。注意302的响应代码,这意味着它被找到,但是你被重定向。现在来看看Location标头。它读取' / login'。我怀疑这是您的代码正在下载的页面,因为您的Go程序没有像您的浏览器那样拥有此站点的登录会话/ cookie。
我们的浏览器为浏览网站做了大量工作。从头开始编码可能有点工作。您必须注意cookie,身份验证,标头,重定向等。
Remote Address:174.35.7.172:80 Request URL:http://www.chess.com/echess/download_pgn?lid=1222621131 Request Method:GET Status Code:302 Found Response Headers view parsed HTTP/1.1 302 Found Date: Sat, 25 Jul 2015 20:49:43 GMT Server: PWS/8.1.20.22 X-Px: ms h0-s1027.p12-sjc ( origin) P3P: CP="ALL DSP COR LAW CURa ADMa DEVa TAIa OUR BUS IND ONL UNI COM NAV DEM CNT" Cache-Control: private Pragma: no-cache Expires: Thu, 19 Nov 1981 08:52:00 GMT Content-Length: 0 Content-Type: text/html; charset=utf-8 Location: /login Connection: keep-alive Set-Cookie: PHPSESSID=pach18her77q4asgsq2heohvj1; path=/; domain=.chess.com; HttpOnly Request Headers view parsed GET /echess/download_pgn?lid=1222621131 HTTP/1.1 Host: www.chess.com Connection: keep-alive Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8,es;q=0.6 Query String Parameters view source view URL encoded lid:1222621131