R,使用xpathSApply抓取web

时间:2014-02-19 00:10:44

标签: r web-scraping

解析网页后,我可以看到像这样的xpath ..

gethelp.df =htmlTreeParse(url, useInternalNodes = T)
gethelp.df
.
.
....
<div class="lia-message-post-date">
        <a class="lia-link-navigation" id="link_14" href="/t5/Facebook/m-p/3947664">
            <span class="DateTime">
        <span class="local-date">?06-05-2013</span>
        <span class="local-time">09:38 AM</span>
</span>
        </a>
    </div>

我想抓住“06-05-2013”​​部分。

到目前为止,我尝试了这些和其他一些,但它不起作用。任何人都可以指出我在这里缺少的东西吗?

xpathSApply(gethelp.df, "//span[@class='local-time']", xmlGetAttr, "href")
xpathSApply(gethelp.df, "//div[@class='lia-message-post-date']/span", xmlGetAttr, "href")

谢谢!

1 个答案:

答案 0 :(得分:4)

xpathSApply(gethelp.df, "//span[@class='local-date']", xmlValue)