微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

有序因子变量的标记

如何解决有序因子变量的标记

我正在尝试使用 (CURRENT_DATE - INTERVAL(${interval}))生成单变量输出表。

gtsummary

当我运行此代码时,我得到以下输出。问题是有序因子变量 (structure(list(id = 1:10,age = structure(c(3L,3L,2L,1L,1L),.Label = c("c","b","a"),class = c("ordered","factor")),sex = structure(c(2L,2L),.Label = c("F","M"),class = "factor"),country = structure(c(1L,.Label = c("eng","scot","wale"),edu = structure(c(1L,3L),.Label = c("x","y","z"),lungfunction = c(45L,23L,25L,45L,70L,69L,90L,50L,62L,45L),ivdays = c(15L,26L,36L,34L,4L,5L,8L,9L,15L),no2 = c(40L,60L,30L,80L,89L,10L,40L),pm25 = c(15L,20L,48L,28L,15L,15L)),row.names = c(NA,10L),class = "data.frame") ... library(gtsummary) publication_dummytable1_sum %>% select(sex,age,lungfunction,ivdays) %>% tbl_uvregression( method =lm,y = lungfunction,pvalue_fun = ~style_pvalue(.x,digits = 3) ) %>% add_global_p() %>% # add global p-value bold_p() %>% # bold p-values under a given threshold bold_labels() ... ) 的标记。 R 为有序因子变量选择自己的标签。是否可以告诉 R 不要为有序因子变量选择自己的标签

enter image description here

我想要如下输出

enter image description here

解决方法

像许多其他人一样,我认为您可能误解了 R 中“有序”因子的含义。在某种意义上,R 中的所有 因子都是有序的;估计值等通常以levels 向量的顺序打印、绘制等。指定因子的类型为 ordered 有两个主要影响:

  • 它允许您评估因子水平上的不等式(例如,您可以filter(age > "b")
  • 对比度默认设置为正交多项式对比度,这是L(线性)和Q(二次)标签的来源:参见例如this CrossValidated answer 了解更多详情。

如果您希望以与常规因素相同的方式处理此变量(以便对组与基线水平的差异进行估计,即处理对比),您可以:

  • 转换回无序因子(例如 factor(age,ordered=FALSE)
  • 指定您要在模型中使用处理对比(在基础 R 中,您将指定 contrasts = list(age = "contr.treatment")
  • set options(contrasts = c(unordered = "contr.treatment",ordered = "contr.treatment"))ordered 的默认值是“contr.poly”)

如果您有一个无序(“常规”)因子并且级别不是您想要的顺序,您可以通过明确指定级别来重置级别顺序,例如

mutate(across(age,factor,levels = c("0-10 years","11-20 years","21-30 years","30-40 years")))

R 默认按字母顺序设置因子,这有时不是你想要的(但我想不出顺序是“随机”的情况......)

,

删除有序变量的奇数标签的最简单方法是从这些因子变量中删除有序类。下面的例子!

library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.4.2'

publication_dummytable1_sum <- 
  structure(list(id = 1:10,age = structure(c(3L,3L,2L,1L,1L),.Label = c("c","b","a"),class = c("ordered","factor")),sex = structure(c(2L,2L),.Label = c("F","M"),class = "factor"),country = structure(c(1L,.Label = c("eng","scot","wale"),edu = structure(c(1L,3L),.Label = c("x","y","z"),lungfunction = c(45L,23L,25L,45L,70L,69L,90L,50L,62L,45L),ivdays = c(15L,26L,36L,34L,4L,5L,8L,9L,15L),no2 = c(40L,60L,30L,80L,89L,10L,40L),pm25 = c(15L,20L,48L,28L,15L,15L)),row.names = c(NA,10L),class = "data.frame") |>
  as_tibble()

# R labels the order factors like this in lm()
lm(lungfunction ~ age,publication_dummytable1_sum)
#> 
#> Call:
#> lm(formula = lungfunction ~ age,data = publication_dummytable1_sum)
#> 
#> Coefficients:
#> (Intercept)        age.L        age.Q  
#>       51.17       -10.37       -15.11


tbl <-
  publication_dummytable1_sum %>% 
  # remove ordered class
  mutate(across(where(is.ordered),~factor(.,ordered = FALSE))) %>%
  select(sex,age,lungfunction,ivdays) %>% 
  tbl_uvregression(
    method =lm,y = lungfunction,pvalue_fun = ~style_pvalue(.x,digits = 3)
  )

enter image description here reprex package (v2.0.0) 于 2021 年 7 月 22 日创建

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。