微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用省略号作为输入变量时向函数添加描述性统计信息

如何解决使用省略号作为输入变量时向函数添加描述性统计信息

对于一项作业,我在R中创建了一个函数,该函数可计算回归系数,预测值和数据残差,这对于进行多次线性回归非常有用。它这样做如下:

MLR <- function(y_var,...){  
  
  y <- y_var  
  X <- as.matrix(cbind(...))  
  
  intercept <- rep(1,length(y)) 
  
  X <- cbind(intercept,X) 
  
  regression_coef <- solve(t(X) %*% X) %*% t(X) %*% y  
  
  predicted_val <- X %*% regression_coef 
  
  residual_val <- y - predicted_val 
 
  
  scatterplot <- plot(predicted_val,residual_val,ylab = 'Residuals',xlab = 'Predicted values',main = 'Predicted values against the residuals',abline(0,0))
 
  list('y' = y,'X' = X,'Regression coefficients' = regression_coef,'Predicted values' = predicted_val,'Residuals' = residual_val,'Scatterplot' = scatterplot
       )
}

现在,我的工作是为输入变量添加描述性统计信息。由于我希望自变量可以是任意数字,因此我将省略号用作输入变量。有没有一种方法可以计算我的自变量(由...定义)的有用的描述统计量(均值,方差,标准差)?

mean(...)

不起作用...

已经感谢您的答复!

解决方法

尝试对您的功能进行一些细微的更改。我已将iris数据集的某些变量应用于该变量。您可以在X上计算所需的统计信息,然后将其作为附加插槽输出。这里的代码:

#Function
MLR <- function(y_var,...){  
  
  y <- y_var
  X <- as.matrix(cbind(...))  
  RX <- X
  
  intercept <- rep(1,length(y)) 
  
  X <- cbind(intercept,X) 
  
  regression_coef <- solve(t(X) %*% X) %*% t(X) %*% y  
  
  predicted_val <- X %*% regression_coef 
  
  residual_val <- y - predicted_val 
  
  
  scatterplot <- plot(predicted_val,residual_val,ylab = 'Residuals',xlab = 'Predicted values',main = 'Predicted values against the residuals',abline(0,0))
  
  #Summary
  #Stats
  DMeans <- apply(RX,2,mean,na.rm=T)
  DSD <- apply(RX,sd,na.rm=T)
  DVar <- apply(RX,var,na.rm=T)
  DSummary <- rbind(DMeans,DSD,DVar)
  #Out
  list('y' = y,'X' = X,'Regression coefficients' = regression_coef,'Predicted values' = predicted_val,'Residuals' = residual_val,'Scatterplot' = scatterplot,'Summary' = DSummary
  )
}
#Apply
MLR(y_var = iris$Sepal.Length,iris$Sepal.Width,iris$Petal.Length)

输出的最后一个插槽将如下所示:

$Scatterplot
NULL

$Summary
            [,1]     [,2]
DMeans 3.0573333 3.758000
DSD    0.4358663 1.765298
DVar   0.1899794 3.116278
,

我想我明白了。不幸的是,省略号与它们一起工作似乎很古怪。检查cbind(...)是否在您的函数中正确运行(当我在输出中检查它时,它只有1列宽,而我向其中输入了2个变量,这似乎不正确。>

我的解决方案不读取变量名-它使用占位符名称(Var_1,Var_2,...,Var_n)


MLR <- function(y_var,...){  
  
  # these two packages will come in handy
  
  require(dplyr)
  require(tidyr)
  
  y <- y_var  
  X <- as.matrix(cbind(...))
  
  # firstly,we need to make df/tibble out of ellipsis
  
  X2 <- list(...)
  
  n <- tibble(n = rep(0,times = length(y)))
  
  index <- 0
  
  for(Var in X2){
    
    index <- index + 1
    n[,paste0("Var_",index)] <- Var
    
  }
  
  # after the df was created,now it's time for calculating desc
  # Using tidyr::gather with dplyr::summarize creates nice summary,# where each row is another variable
  
  descriptives <- tidyr::gather(n,key = "Variable",value = "Value") %>%
    group_by(Variable) %>%
    summarize(mean = mean(Value),var = var(Value),sd = sd(Value),.groups = "keep")
  
  # everything except the output list is the same
  
  intercept <- rep(1,0))
  
  
  list('y' = y,'descriptives' = descriptives[-1,] # need to remove the first row 
                                          # because it is "n" placeholder
  )
}

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。