如何解决使用RStudio刮取/提取Google搜索中文章的日期
我正在尝试在Google搜索中删除文章的日期。但是,我认为我一直坚持寻找正确的XPath
来做到这一点。我试图通过开发模式(检查代码)找到它,但得到//*@id="rso"]/div[3]/div/div[2]/div/span/span[1]
,它不起作用。
我最接近日期的是这个:
library(rvest)
library(dplyr)
web1 <- read_html("https://www.google.at/search?q=uk+house+prices&source=lnt&tbs=qdr:m&sa=X&ved=2ahUKEwin8NynhMjsAhUmQkEAHTqzBygQpwV6BAgVEB0&biw=927&bih=722")
web1 %>%
html_nodes(xpath = '//div/div/div/div/div[not(div)]') %>%
html_text
[1] "Search options"
[2] "Any country"
[3] "Any Language"
[4] "Last month"
[5] "All results"
[6] "01.10.2020 · Why record UK house prices Could be falling again soon. Analysis by Hanna Ziady,CNN Business. Updated 11:49 AM ET,Thu October 1,2020. london UK ..."
<...>
[7] "Is the UK housing market about to crash?"
[31] "08.10.2020 · Which? explains what Could happen to house prices after the Brexit transition period ends,including advice and predictions from mortgage and property experts."
我唯一需要的是日期(01.10.2020,08.10.2020)。
如何从Google的SERP中提取日期?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。