如何解决R报废IMDB:处理丢失信息的更好方法?
我正在关注此网站,以从IMDB获取信息:https://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on-knowledge/
但是,IMDB中缺少某些数据。该网站建议进行外观检查并编写如下功能:
for (i in c(39,73,80,89)){
a<-Metascore_data[1:(i-1)]
b<-Metascore_data[i:length(Metascore_data)]
Metascore_data<-append(a,list("NA"))
Metascore_data<-append(Metascore_data,b)
}
我想知道是否有更好的方法以编程方式处理此问题?
解决方法
以下对我有用:
library(rvest)
URL <- 'https://www.imdb.com/search/title/?title_type=feature&online_availability=US/IMDbTV&start=1251&ref_=adv_nxt'
webpage <- read_html(URL)
genres <- webpage %>%
html_nodes('span.genre') %>%
html_text() %>%
trimws()
这将返回50个值:
genres
# [1] "Comedy,Romance" "Action,Crime,Drama"
# [3] "Action,Horror,Sci-Fi" "Action,Adventure,Thriller"
# [5] "Adventure,Comedy,Family" "Comedy"
# [7] "Action,Thriller" "Comedy,Drama,Romance"
# [9] "Comedy" "Comedy"
#[11] "Action,Drama" "Action,Thriller"
#[13] "Action,Thriller" "Mystery,Thriller"
#[15] "Crime,Thriller" "Drama,Horror"
#[17] "Animation,War" "Drama,Thriller"
#[19] "Action,Drama" "Drama,Sci-Fi"
#[21] "Adventure,Family" "Crime,Drama"
#[23] "Action,Thriller" "Action,Sci-Fi"
#[25] "Thriller" "Comedy,Crime"
#[27] "Comedy,Biography,Drama"
#[29] "Adventure,Comedy" "Crime,Thriller"
#[31] "Drama,Sci-Fi,Thriller" "Comedy,Romance"
#[33] "Action,Thriller" "Action,Sci-Fi"
#[35] "Action,Drama" "Action,Drama"
#[37] "Action,Thriller" "Action,War"
#[39] "Drama,Thriller" "Animation,Family"
#[41] "Drama,Romance" "Action,Fantasy"
#[43] "Action,Fantasy" "Comedy,Drama"
#[45] "Action,Sci-Fi"
#[47] "Drama,Romance" "Animation,Family,Fantasy"
#[49] "Action,Fantasy" "Mystery,Thriller"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。