如何解决当 API 返回错误时应用将数据帧转换为字符的函数
我编写了一个函数来查询 CMS 国家计划和提供者枚举系统 (NPPES) API。
我希望传入一个包含 NPI 值的数据框并返回它们的地址。
某些 NPI 值不再有效,我已尝试为这些情况构建一些错误处理。
我的错误处理 if else
语句指定使用维度为 1 行 x 6 列的数据框,并且我已将错误的 NPI 值插入第 1 行的第 1 列。
当我对我的数据框使用 apply 函数时,我得到一个列表 [1x6],用于所有成功的 API 调用,但错误的值只是一个字符向量。
我试图调试这个问题,但我无法弄清楚从数据框到字符的转换发生在哪里。如果有人可以帮助我,我将不胜感激。
这是我希望查询的值的数据框:
install.packages("pacman")
library(pacman)
pacman::p_load(tidyverse,data.table,httr,jsonlite)
values <- c(1598727430,1083632731,1710983663) # LAST VALUE PRODUCES THE ERROR CASE
npi_values <- data.frame(values)
这里是 API 的 URL:
path <- "https://npiregistry.cms.hhs.gov/api/?"
我的功能:
# CREATE A FUNCTION TO PULL NPI informatION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,query = list(
version = "2.0",number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,as = "text",encoding = "UTF-8"
) %>%
fromJSON(.,flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FirsT
# COLUMN
if (any(grepl("ERROR",toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix(NA,ncol = 6,nrow = 1)) %>%
dplyr::rename(`NPI NUMBER` = V1,`CMS REF ADDRESS 1` = V2,`CMS REF ADDRESS 2` = V3,`CMS REF CITY` = V4,`CMS REF STATE` = V5,`CMS REF ZIP` = V6)
npi_details[1,1] <- object
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details,contains(c("addresses","number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,-COUNTRY_CODE,-COUNTRY_NAME,-ADDRESS_PURPOSE,-ADDRESS_TYPE,`CMS REF ADDRESS 1` = ADDRESS_1,`CMS REF ADDRESS 2` = ADDRESS_2,`CMS REF CITY` = CITY,`CMS REF STATE` = STATE,`CMS REF ZIP` = POSTAL_CODE
)
}
}
然后我将此函数应用于上述 NPI 值的数据框:
out <- apply(npi_values,1,getNPI)
当我将它应用到我的真实数据集时,您可以在下面看到错误情况被转换为字符,即使我指定了一个大小为 1 行 x 6 列的数据框
根据@akrun 的反馈,我修改了我的 apply 语句,将 getNPI 函数包含在一个列表中,见下文:
out <- apply(npi_values,function(x) list(getNPI(x)))
out
的结构现在如下所示:
str(out)
List of 3
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1598727430
.. ..$ CMS REF ADDRESS 1: chr "PO Box 17567"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "PENSACOLA"
.. ..$ CMS REF STATE : chr "FL"
.. ..$ CMS REF ZIP : chr "325227567"
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1083632731
.. ..$ CMS REF ADDRESS 1: chr "PO Box 17326"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "DENVER"
.. ..$ CMS REF STATE : chr "CO"
.. ..$ CMS REF ZIP : chr "802170326"
$ :List of 1
..$ : Named num 1.71e+09
.. ..- attr(*,"names")= chr "values"
当我尝试将这些列表折叠为 3 行 x 6 列的数据框时,最后一种情况(错误情况)进入了不需要的第 7 列。我希望将 3 个案例的值存储在第一列中,其余值用 NA 填充。
预期结果:
`NPI NUMBER` <- c(1598727430,1710983663)
`CMS REF ADDRESS 1` <- c("PO Box 17567","PO Box 17326",NA)
`CMS REF ADDRESS 2` <- c("","",NA)
`CMS REF CITY` <- c("PENSACOLA","DENVER",NA)
`CMS REF STATE` <- c("FL","CO",NA)
`CMS REF ZIP` <- c("325227567","802170326",NA)
desired <- data.frame(`NPI NUMBER`,`CMS REF ADDRESS 1`,`CMS REF ADDRESS 2`,`CMS REF CITY`,`CMS REF STATE`,`CMS REF ZIP`)
解决方法
apply()
将其提供的对象转换为矩阵,其中所有值都必须属于同一类型。最通用的类型是字符,因此它会转换为字符,并且您的函数将应用于此字符矩阵。
见?apply()
:
如果 X 不是数组而是具有非 null 值的类的对象(例如数据框),则尝试通过 as.matrix 将其强制转换为二维数组(例如,数据框) 或通过 as.array。
,事实证明,我需要在 if else 语句的 # CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,query = list(
version = "2.0",number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,as = "text",encoding = "UTF-8"
) %>%
fromJSON(.,flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR",toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix("error",ncol = 6,nrow = 1),stringsAsFactors = FALSE) %>%
dplyr::rename(
`NPI NUMBER` = V1,`CMS REF ADDRESS 1` = V2,`CMS REF ADDRESS 2` = V3,`CMS REF CITY` = V4,`CMS REF STATE` = V5,`CMS REF ZIP` = V6
) %>% as_tibble()
npi_details[1,1] <- as.character(object)
return(npi_details)
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details,contains(c("addresses","number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,-COUNTRY_CODE,-COUNTRY_NAME,-ADDRESS_PURPOSE,-ADDRESS_TYPE,`CMS REF ADDRESS 1` = ADDRESS_1,`CMS REF ADDRESS 2` = ADDRESS_2,`CMS REF CITY` = CITY,`CMS REF STATE` = STATE,`CMS REF ZIP` = POSTAL_CODE
) %>%
mutate(`NPI NUMBER` = as.character(`NPI NUMBER`))
}
}
部分中返回 npi_details 的值,以保持我为错误情况创建的 tibble 周围的工作!
Artifact _site was downloaded to /home/runner/work/Singleton/Singleton
Artifact download has finished successfully
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。