如何解决你能帮我使用 Rvest 进行网页抓取吗?
我目前正在尝试抓取以下网站:https://chicago.suntimes.com/crime/archives
我一直依靠 CSS Selector Gadget 来查找 x 路径并进行网页抓取。但是,我无法使用此网站中的小工具,我必须使用 Inspect Source 来查找我需要的内容。我一直试图通过向下滚动每个源来找到相关的 css 和 xpath,但由于我的能力有限,我无法做到。
你能帮我找到xpath或css吗
- 标题
- 作者
- 日期
如果这是一份包含所有内容的干洗清单,我很抱歉……但我真的被困住了。如果您能给我一些帮助,我将不胜感激!
非常感谢。
解决方法
对于您想要提取的每个元素,如果您使用选择器小工具找到具有相应类的相关标签,您将能够获得您想要的内容。
library(rvest)
url <- 'https://chicago.suntimes.com/crime/archives'
webpage <- url %>% read_html()
title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
result <- data.frame(title,author,date)
result
result
# title author date
#1 Belmont Cragin man charged with carjacking in Little Village: police Sun-Times Wire February 17
#2 Gas station robbed,man carjacked in Horner Park Jermaine Nolen February 17
#3 8 shot,2 fatally,Tuesday in Chicago Sun-Times Wire February 17
#4 Businesses robbed at gunpoint on the Northwest Side: police Sun-Times Wire February 17
#5 Man charged with carjacking in Aurora Sun-Times Wire February 16
#6 Woman fatally stabbed in Park Manor apartment Sun-Times Wire February 16
#7 Woman critically hurt by gunfire in Woodlawn David Struett February 16
#8 Teen boy,17,charged with attempted carjacking in Back of the Yards Sun-Times Wire February 16
#...
#...
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。