假设我使用以下表达式解析了一个网站
library(XML)
url.df_1 = htmlTreeParse("http://www.appannie.com/app/android/com.king.candycrushsaga/", useInternalNodes = T)
如果我在代码下面运行,
xpathSApply(url.df_1, "//div[@class='app_content_section']/h3", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
我会在下面 -
[1] "Description" "What's new"
[3] "Permissions" "More Apps by King.com All Apps »"
[5] "Customers Also Viewed" "Customers Also Installed"
现在,我感兴趣的只是“客户也安装”部分。但是,当我运行以下代码时,
xpathSApply(url.df_1, "//div[@class='app_content_section']/ul/li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
它吐出“King.com所有应用程序的更多应用程序”,“客户也查看”和“客户也已安装”中包含的所有应用程序。
所以我试过了,
xpathSApply(url.df_1, "//div[h3='Customers Also Installed']”, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
但这不起作用。所以我试过
xpathSApply(url.df_1, "//div[contains(.,'Customers Also Installed')]",xmlValue)
但这也不起作用。 (输出应该如下所示 - )
[,1]
[1,] "Christmas Candy Free\n Daniel Development\n "
[2,] "/app/android/xmas.candy.free/"
[,2]
[1,] "Jewel Candy Maker\n Nutty Apps\n "
[2,] "/app/android/com.candy.maker.jewel.nuttyapps/"
[,3]
[1,] "Pogz 2\n Terry Paton\n "
[2,] "/app/android/com.terrypaton.unity.pogz2/"
任何指导都将不胜感激!
答案 0 :(得分:5)
这是一个选项(你真的很接近):
xpathSApply(url.df_1,"//div[contains(.,'Customers Also Installed')]/*/li/a",xmlGetAttr,'href')
[1] "/app/android/xmas.candy.free/"
[2] "/app/android/com.candy.maker.jewel.nuttyapps/"
[3] "/app/android/com.terrypaton.unity.pogz2/"