如何解决为什么我不能清理pdf表并将列重命名为函数?
我想出了如何抓取此PDF,但是我需要处理很多这些文件。我的意图是将此函数设置为函数,从所有pdf导入数据(几年中每月一个pdf),然后执行rbind()制作一个数据表,然后将其编写为csv。
这有效。
library(tidyverse)
library(tabulizer)
#import the data
jan16s_raw <- extract_tables("https://www.nvsos.gov/sos/home/showdocument?id=4062")
#create data frame
cleanNvsen <- do.call(rbind,jan16s_raw)
cleanNvsen2 <-as.data.frame(cleanNvsen[3:nrow(cleanNvsen),])
#rename all of the columns
names(cleanNvsen2)[1] <- "District"
names(cleanNvsen2)[2] <- "Democrat"
names(cleanNvsen2)[3] <- "Independent American"
names(cleanNvsen2)[4] <- "Libertarian"
names(cleanNvsen2)[5] <- "Nonpartisan"
names(cleanNvsen2)[6] <- "Other"
names(cleanNvsen2)[7] <- "Republican"
names(cleanNvsen2)[8] <- "Total"
#check to see if it worked
head(example)
但这会导致1 x 1数据帧
library(tidyverse)
library(tabulizer)
#load data
jan16s_raw <- extract_tables("https://www.nvsos.gov/sos/home/showdocument?id=4062")
#create function to create data frame and then rename
clean <- function(x) {
cleanNvsen <- do.call(rbind,x)
cleanNvsen2 <-as.data.frame(cleanNvsen[3:nrow(cleanNvsen),])
names(cleanNvsen2)[1] <- "District"
names(cleanNvsen2)[2] <- "Democrat"
names(cleanNvsen2)[3] <- "Independent American"
names(cleanNvsen2)[4] <- "Libertarian"
names(cleanNvsen2)[5] <- "Nonpartisan"
names(cleanNvsen2)[6] <- "Other"
names(cleanNvsen2)[7] <- "Republican"
names(cleanNvsen2)[8] <- "Total"
}
x2 <- clean(jan16s_raw)
head(x2)
我真的很想让它工作,以便我可以仅向R输入网址,然后运行我创建的这个干净函数。我要处理许多文件。
解决方法
您可以编写clean
函数来提取数据并重命名列。我们可以一次重命名多个列,而无需分别重命名。
clean <- function(url) {
jan16s_raw <- extract_tables(url)
#create data frame
cleanNvsen <- do.call(rbind,jan16s_raw)
cleanNvsen2 <- as.data.frame(cleanNvsen[3:nrow(cleanNvsen),])
#rename all of the columns
names(cleanNvsen2) <- c("District","Democrat","Independent American","Libertarian","Nonpartisan","Other","Republican","Total")
return(cleanNvsen2)
}
为要从中提取数据的所有网址创建一个向量。
list_of_urls <- c('https://www.nvsos.gov/sos/home/showdocument?id=4062','https://www.nvsos.gov/sos/home/showdocument?id=4064')
然后为每个网址调用clean
函数并合并数据。
all_data <- purrr::map_df(list_of_urls,clean)
#OR
#all_data <- do.call(rbind,lapply(list_of_urls,clean))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。