以下是我想要提取的网址:
> links
[1] "https://www.makemytrip.com/holidays-india/"
[2] "https://www.makemytrip.com/holidays-india/"
[3] "https://www.yatra.com/india-tour-packages"
[4] "http://www.thomascook.in/tcportal/international-holidays"
[5] "https://www.yatra.com/holidays"
[6] "https://www.travelguru.com/holiday-packages/domestic-packages.shtml"
[7] "https://www.chanbrothers.com/package"
[8] "https://www.tourmyindia.com/packagetours.html"
[9] "http://traveltriangle.com/tour-packages"
[10] "http://www.coxandkings.com/bharatdeko/"
[11] "https://www.sotc.in/india-tour-packages"
我设法使用:
for (i in 1:10){
html <- getURL(links[i], followlocation = TRUE)
parse html
doc = htmlParse(html, asText=TRUE)
plain.text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)}
但事实是所有提取的数据都保存在&#34; plain.text中。&#34;我如何拥有&#34; plain.text&#34;每个链接?
谢谢。