用golang抓取网页内容

时间:2018-11-18 05:38:13

标签: html go

我是编程语言的初学者,我正在学习scrape。是否可以在这样的注释中获取数据?

<tbody id="the-list">
<tr>
    <td valign="top" align="right">1.</td>
    <td valign="top">BEKASI</td>
    <td valign="top">Tambun</td>
    <td valign="top">Selatan</td>
    <td valign="top">01.4.13.16.06.000013</td>
    <td valign="top">Jalan</td>
    <td valign="top">PERUM BEKASI GRIYA ASRI</td>
    <td valign="top">1.500 m<sup>2</sup></td>
    <td valign="top" align="center">Kantor</td>
    <td valign="top">400 m<sup>2</sup></td>
    <td valign="top" align="center">1998</td>            
    <td valign="top" align="center">> 200</td>

    <!--
    <td valign="top" align="center">-6.2245</td>
    <td valign="top" align="center">107.0827</td>
    -->

    <td valign="top" align="right">3</td>
    <td valign="top" align="right">7</td>
    <td valign="top" align="right">2</td>
    <td valign="top" align="right">150</td>
    <td valign="top">08888123</td>
    <td valign="top">-</td>

</tr>

我希望结果可以像这样

1.;BEKASI;Tambun;Selatan;01.4.13.16.06.000013;Jalan;PERUM BEKASI GRIYA ASRI;1.500 m;Kantor;400 m;1998;200;-6.2245;107.0827;3;7;2;150;08888123;-

1 个答案:

答案 0 :(得分:0)

goquery是解析HTML内容的绝佳库。

    html := `
      <table><tbody id="the-list">
         <tr>
            <td valign="top" align="right">1.</td>
            <td valign="top">BEKASI</td>
            <td valign="top">Tambun</td>
            <td valign="top">Selatan</td>
            <td valign="top">01.4.13.16.06.000013</td>
            <td valign="top">Jalan</td>
            <td valign="top">PERUM BEKASI GRIYA ASRI</td>
            <td valign="top">1.500 m<sup>2</sup></td>
            <td valign="top" align="center">Kantor</td>
            <td valign="top">400 m<sup>2</sup></td>
            <td valign="top" align="center">1998</td>            
            <td valign="top" align="center">> 200</td>

            <!--
            <td valign="top" align="center">-6.2245</td>
            <td valign="top" align="center">107.0827</td>
            -->

            <td valign="top" align="right">3</td>
            <td valign="top" align="right">7</td>
            <td valign="top" align="right">2</td>
            <td valign="top" align="right">150</td>
            <td valign="top">08888123</td>
            <td valign="top">-</td>

        </tr>
   </tbody></table>
`
    doc, _ := goquery.NewDocumentFromReader(strings.NewReader(html))
    sel := doc.Find("#the-list td")
    for i := range sel.Nodes{
        n := sel.Eq(i)
        fmt.Println(n.Text())
    }