微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

你能帮我使用 Rvest 进行网页抓取吗?

如何解决你能帮我使用 Rvest 进行网页抓取吗?

我目前正在尝试抓取以下网站:https://chicago.suntimes.com/crime/archives

我一直依靠 CSS Selector Gadget 来查找 x 路径并进行网页抓取。但是,我无法使用此网站中的小工具,我必须使用 Inspect Source 来查找我需要的内容。我一直试图通过向下滚动每个源来找到相关的 css 和 xpath,但由于我的能力有限,我无法做到。

你能帮我找到xpath或css吗

如果这是一份包含所有内容的干洗清单,我很抱歉……但我真的被困住了。如果您能给我一些帮助,我将不胜感激!

非常感谢。

解决方法

对于您想要提取的每个元素,如果您使用选择器小工具找到具有相应类的相关标签,您将能够获得您想要的内容。

library(rvest)
url <- 'https://chicago.suntimes.com/crime/archives'

webpage <- url %>% read_html() 
title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
result <- data.frame(title,author,date)
result

result
#                                                                                               title              author        date
#1                               Belmont Cragin man charged with carjacking in Little Village: police       Sun-Times Wire February 17
#2                                                   Gas station robbed,man carjacked in Horner Park       Jermaine Nolen February 17
#3                                                              8 shot,2 fatally,Tuesday in Chicago       Sun-Times Wire February 17
#4                                        Businesses robbed at gunpoint on the Northwest Side: police       Sun-Times Wire February 17
#5                                                              Man charged with carjacking in Aurora       Sun-Times Wire February 16
#6                                                       Woman fatally stabbed in Park Manor apartment      Sun-Times Wire February 16
#7                                                        Woman critically hurt by gunfire in Woodlawn       David Struett February 16
#8                                Teen boy,17,charged with attempted carjacking in Back of the Yards      Sun-Times Wire February 16
#...
#...

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。