微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

R报废IMDB:处理丢失信息的更好方法?

如何解决R报废IMDB:处理丢失信息的更好方法?

我正在关注此网站,以从IMDB获取信息:https://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on-knowledge/

但是,IMDB中缺少某些数据。该网站建议进行外观检查并编写如下功能

for (i in c(39,73,80,89)){

a<-Metascore_data[1:(i-1)]

b<-Metascore_data[i:length(Metascore_data)]

Metascore_data<-append(a,list("NA"))

Metascore_data<-append(Metascore_data,b)

}

我想知道是否有更好的方法以编程方式处理此问题?

解决方法

以下对我有用:

library(rvest)
URL <- 'https://www.imdb.com/search/title/?title_type=feature&online_availability=US/IMDbTV&start=1251&ref_=adv_nxt'
webpage <- read_html(URL)
genres <- webpage %>%
  html_nodes('span.genre') %>%
  html_text() %>%
  trimws()

这将返回50个值:

genres
# [1] "Comedy,Romance"              "Action,Crime,Drama"        
# [3] "Action,Horror,Sci-Fi"       "Action,Adventure,Thriller" 
# [5] "Adventure,Comedy,Family"    "Comedy"                      
# [7] "Action,Thriller"  "Comedy,Drama,Romance"      
# [9] "Comedy"                       "Comedy"                      
#[11] "Action,Drama"     "Action,Thriller"            
#[13] "Action,Thriller"      "Mystery,Thriller"           
#[15] "Crime,Thriller"       "Drama,Horror"               
#[17] "Animation,War"        "Drama,Thriller"             
#[19] "Action,Drama"         "Drama,Sci-Fi"               
#[21] "Adventure,Family"    "Crime,Drama"                
#[23] "Action,Thriller"  "Action,Sci-Fi"   
#[25] "Thriller"                     "Comedy,Crime"               
#[27] "Comedy,Biography,Drama"    
#[29] "Adventure,Comedy"            "Crime,Thriller"      
#[31] "Drama,Sci-Fi,Thriller"      "Comedy,Romance"             
#[33] "Action,Thriller"      "Action,Sci-Fi"   
#[35] "Action,Drama"         "Action,Drama"    
#[37] "Action,Thriller"             "Action,War"          
#[39] "Drama,Thriller"      "Animation,Family"
#[41] "Drama,Romance"               "Action,Fantasy"      
#[43] "Action,Fantasy"   "Comedy,Drama"        
#[45] "Action,Sci-Fi"   
#[47] "Drama,Romance"               "Animation,Family,Fantasy"  
#[49] "Action,Fantasy"   "Mystery,Thriller"           

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。