假设我使用下面的表达式解析了一个网站
library(XML) url.df_1 = htmlTreeParse("http://www.appannie.com/app/android/com.king.candycrushsaga/",useInternalNodes = T)
如果我在代码下运行,
xpathSApply(url.df_1,"//div[@class='app_content_section']/h3",function(x) c(xmlValue(x),xmlAttrs(x)[["href"]]))
我会得到以下 –
[1] "Description" "What's new" [3] "Permissions" "More Apps by King.com All Apps »" [5] "Customers Also Viewed" "Customers Also Installed"
现在,我感兴趣的只是“客户也安装”部分.但是,当我运行以下代码时,"//div[@class='app_content_section']/ul/li/a",xmlAttrs(x)[["href"]]))
它将“King.com所有应用程序中的更多应用程序”,“还查看了客户”和“客户也已安装”中包含的所有应用程序吐出.
所以我试过,"//div[h3='Customers Also Installed']”,xmlAttrs(x)[["href"]]))
但这没用.所以我试过了
xpathSApply(url.df_1,"//div[contains(.,'Customers Also Installed')]",xmlValue)
但这也不起作用. (输出应该如下所示)
[,1] [1,] "Christmas Candy Free\n Daniel Development\n " [2,] "/app/android/xmas.candy.free/" [,2] [1,] "Jewel Candy Maker\n Nutty Apps\n " [2,] "/app/android/com.candy.maker.jewel.nuttyapps/" [,3] [1,] "Pogz 2\n Terry Paton\n " [2,] "/app/android/com.terrypaton.unity.pogz2/"
任何指导将不胜感激!
解决方法
这是一个选项(你真的很接近):
xpathSApply(url.df_1,'Customers Also Installed')]/*/li/a",xmlGetAttr,'href') [1] "/app/android/xmas.candy.free/" [2] "/app/android/com.candy.maker.jewel.nuttyapps/" [3] "/app/android/com.terrypaton.unity.pogz2/"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。