Question

我尝试使用Golang从Reddit获取html源：

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "time"
)

func main() {
    timeout := time.Duration(5 * time.Second)
    client := http.Client{
        Timeout: timeout,
    }
    resp, _ := client.Get("https://www.reddit.com/")
    bytes, _ := ioutil.ReadAll(resp.Body)
    fmt.Println("HTML:\n\n", string(bytes))
    defer resp.Body.Close()
    var input string
    fmt.Scanln(&input)
}

首先尝试很好。但是第二次遇到错误：

<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>

<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>

<p>please wait 6 second(s) and try again.</p>

    <p>as a reminder to developers, we recommend that clients make no
    more than <a href="http://github.com/reddit/reddit/wiki/API">one
    request every two seconds</a> to avoid seeing this message.</p>

我试图设置延迟但它仍然不起作用。抱歉我的英语不好。

Answer 1

Reddit不希望在他们的网站上使用自动扫描器\抓取器，并且具有僵尸保护机制。以下是他们的建议：

每两秒发一次请求

只需在请求之间添加延迟。

Answer 2

timeout有不同的用途。 timeout是运行例程的上限。您在后续请求之间需要sleep。

time.Sleep(6 * time.Second)

错误超时获取HTTP请求golang

2 个答案: