使用谷歌脚本进行网页报废(使用谷歌电子表格)

时间:2018-05-04 03:56:05

标签: google-apps-script web-scraping google-sheets html-parsing

使用google脚本废弃html页面的正确方法是什么?

<html class="gr__mcb_mu"><head></head><body data-gr-c-s-loaded="true"><table class="table table--decorated">
    <tbody><tr>
        <th><span class="visually-hidden">Currency</span></th>
        <th class="content-right">Buy</th>
        <th class="content-right">Sell</th>
    </tr>
            <tr>
                <td>AUD</td>
                <td class="content-right">25.71</td>
                <td class="content-right">26.74</td>
            </tr>
            <tr>
                <td>EUR</td>
                <td class="content-right">41.16</td>
                <td class="content-right">42.39</td>
            </tr>
            <tr>
                <td>GBP</td>
                <td class="content-right">46.7</td>
                <td class="content-right">48.1</td>
            </tr>
            <tr>
                <td>JPY</td>
                <td class="content-right">31.01</td>
                <td class="content-right">32.25</td>
            </tr>
            <tr>
                <td>USD</td>
                <td class="content-right">34.35</td>
                <td class="content-right">35.25</td>
            </tr>
            <tr>
                <td>ZAR</td>
                <td class="content-right">2.68</td>
                <td class="content-right">2.81</td>
            </tr>
</tbody></table>
<p class="content-left"><small>MCB indicative rates (TT) against MUR on 03/05/2018</small><br></p><div style="background-color: rgb(255, 143, 0); display: none; color: white; text-align: center; position: fixed; top: 0px; left: 0px; width: 100%; height: auto; min-width: 100%; min-height: auto; max-width: 100%; font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: &quot;Helvetica Neue&quot;, Helvetica, Arial, Geneva, sans-serif; cursor: pointer; padding: 5px;"><span style="color: white; font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: &quot;Helvetica Neue&quot;, Helvetica, Arial, Geneva, sans-serif;">You have turned off the paragraph player. You can turn it on again from the options page.</span><img src="chrome-extension://gfjopfpjmkcfgjpogepmdjmcnihfpokn/img/icons/icon-close_16.png" style="width: 20px; height: auto; min-width: 20px; min-height: auto; max-width: 20px; float: right; margin-right: 10px;"></div></body></html>

我想得到第一行等于的表的第二行和第三行值:"USD"。换句话说,我希望得到34.35和35.25作为输出。

我的第一次尝试是使用regex对查询进行硬编码,但只要在html页面上进行了微小的更改,它就可能会中断。有没有更好的办法? (内置库或类似的东西?)

0 个答案:

没有答案
相关问题