我想从html内容中解析锚链接。 / *我的HTML内容示例
<a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">*/
<a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
<a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm">
* / Anchor有 href ,我希望得到 Href 的值。但这给了我错误..
错误:单值上下文中的多值s.Attr()
package main
import (
"fmt"
"log"
"github.com/PuerkitoBio/goquery"
)
func ExampleScrape() {
doc, err := goquery.NewDocument("http://www.myurl.com/category-s/1828.htm")
if err != nil {
log.Fatal(err)
}
/* **my sample html after http open** <a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">*/
<a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
<a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm"> ***/
doc.Find("table.v65-productDisplay a.productnamecolor").Each(func(i int, s *goquery.Selection) {
band := s.Attr("href") // here i want to get attribute " href " value. this is not working here.
fmt.Printf(band)
})
}
func main() {
ExampleScrape()
}
答案 0 :(得分:6)
Selection.Attr
返回两个值:属性值和一个布尔值,说明属性是否存在(如果为false,属性值将为空)。
当您忽略多个返回值时,Go不喜欢它,因此您必须将代码更改为以下内容:
var groups = ordered.GroupBy(k => new {
a = !String.IsNullOrEmpty(SelectedFirstCategory) ? k[SelectedFirstCategory] : null,
b = !String.IsNullOrEmpty(SelectedSecondCategory) ? k[SelectedSecondCategory] : null,
c = !String.IsNullOrEmpty(SelectedThirdCategory) ? k[SelectedThirdCategory] : null
});
答案 1 :(得分:0)
您还可以使用golang.org/pkg/net/html软件包。
package main
import (
"fmt"
"log"
"strings"
"golang.org/x/net/html"
)
func main() {
s := `<a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">*/
<a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
<a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm">`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
}
// outputs
// http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm
// http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm
// http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm
希望这对某人有帮助。
答案 2 :(得分:0)
我写了一个可以处理这个问题的替代包:
package main
import (
"github.com/89z/mech"
"strings"
)
const source = `
<a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">
<a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
<a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm">
`
func main() {
doc, err := mech.Parse(strings.NewReader(source))
if err != nil {
panic(err)
}
a := doc.ByTag("a")
for a.Scan() {
println(a.Attr("href"))
}
}