我正在从我正在访问的网址获得此类响应,我需要解析它以获得所需的HTML。
this = ajax({“htmlInfo”:“SOME-HTML”,“otherInfo”:“Blah Blah”,“moreInfo”:“Bleh Bleh”})
如上所述,我有三个密钥对值,我需要从中获取“SOME-HTML”,我怎么能得到它,主要问题是“SOME-HTML”有转义字符。以下是将出现的响应类型。
\ u003Cdiv class = \ u0022container columns-2 \ u0022 \ u003E \ n \ n \ u003Csection class = \ u0022col-main \ u0022 \ u003E \ n \ r \ n \ u \ u003cdiv class = \ u0027visor-article-list list list -view-recent \ u0027 \ u \ n \ u003Cdiv class = \ u0027grid_item visor-article-teaser list_default \ u0027 \ u003E \ n \ u003Ca class = \ u0027grid_img \ u0027 href = \ u0027 / manUnited-is-the-best \ u0027 \ u003E \ n \ u003Cimg src = \ u0022http://www.xyz.com/sites//files/styles/w400h22
任何人都可以在这方面帮助我。我不知道如何解决这个问题。
提前致谢。
答案 0 :(得分:1)
最简单的方法是提取JSON,然后将其解组为结构。 \uXXXX
部分是unicode字符
package main
import (
"encoding/json"
"fmt"
"regexp"
)
// Data follows the structure of the JSON data in the response
type Data struct {
HTMLInfo string `json:"htmlInfo"`
OtherInfo string `json:"otherInfo"`
MoreInfo string `json:"moreInfo"`
}
func main() {
// input is an example of the raw response data. It's probably a []byte if
// you got it from ioutil.ReadAll(resp.Body)
input := []byte(`this=ajax({"htmlInfo":"\u003Cdiv class=\u0022container columns-2\u0022\u003E\n\n \u003Csection class=\u0022col-main\u0022\u003E\n \r\n\u003Cdiv class=\u0027visor-article-list list list-view-recent\u0027 \u003E\r\n\u003Cdiv class=\u0027grid_item visor-article-teaser list_default\u0027 \u003E\n \u003Ca class=\u0027grid_img\u0027 href=\u0027/manUnited-is-the-best\u0027\u003E\n \u003Cimg src=\u0022http://example.com/sites//files/styles/w400h22", "otherInfo": "Blah Blah", "moreInfo": "Bleh Bleh"})`)
// First we want to extract the data json using regex with a capture group.
dataRegex, err := regexp.Compile("ajax\\((.*)\\)")
if err != nil {
fmt.Println("regex failed to compile:", err)
return
}
// FindSubmatch should return two matches:
// 0: The full match
// 1: The contents of the capture group (what we want)
matches := dataRegex.FindSubmatch(input)
if len(matches) != 2 {
fmt.Println("incorrect number of match results:", len(matches))
return
}
dataJSON := matches[1]
// Since the data is in JSON format, we can unmarshal it into a struct. If
// you don't care at all about the fields other than "htmlInfo", you can
// omit them from the struct.
data := &Data{}
if err := json.Unmarshal(dataJSON, data); err != nil {
fmt.Println("failed to unmarshal data json:", err)
}
// You now have access to the "htmlInfo" property
fmt.Println("HTML INFO:", data.HTMLInfo)
}
将产生:
HTML INFO: <div class="container columns-2">
<section class="col-main">
<div class='visor-article-list list list-view-recent' >
<div class='grid_item visor-article-teaser list_default' >
<a class='grid_img' href='/manUnited-is-the-best'>
<img src="http://example.com/sites//files/styles/w400h22