Question

我正在使用colly抓取网站。在OnHTML回调中：

package main

import (
    "fmt"
    "github.com/gocolly/colly"
)

func main() {

    // Instantiate default collector
    c := colly.NewCollector()

    // On every a element which has href attribute call callback
    c.OnHTML("h3", func(e *colly.HTMLElement) {
        link := e.Text
        // Print link
        fmt.Printf("Link found: %q -> %s\n", e.Text, link)
        // Visit link found on page
        // Only those links are visited which are in AllowedDomains
        c.Visit(e.Request.AbsoluteURL(link))
    })

    // Before making a request print "Visiting ..."
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL.String())
    })

    // Start scraping on https://hackerspaces.org
    c.Visit("https://bbs.archusers.ir/")
}

例如，我想获得所有带有“ id Name” ID或全部带有“ class Name”的ID。我该怎么做？！

Answer 1

我找到了答案here。对于colly框架而言，这真是很棒的教程。

OnHTML是功能强大的工具。它可以搜索CSS选择器（即div.my_fancy_class或#someElementId），并且可以将多个OnHTML回调附加到收集器以处理不同的页面类型。

如何通过colly中的id或class查找一个html元素或一组html元素？

1 个答案: