快速准确地计算R 关于lm回归堆栈上的相关主题最小示例基准测试的更新和其他完整最小示例:

如何解决快速准确地计算R 关于lm回归堆栈上的相关主题最小示例基准测试的更新和其他完整最小示例:

在给定以下约束的情况下,我想计算R中大小为 n 的数据集{x,y}的外部学生化残差:

  • (非常)高精度
  • 高性能(尽可能避免循环)
  • R语言(包括RCPP)

R代码应该是快速的,因为它会在[10 ^ 3,10 ^ 6]中的 n 的多个数据集上广泛使用(最小10 ^ 9倍)。此问题是一项较大的工作的一部分,该工作用于估计需要学生化残差的自定义统计量。计算最多的部分是此处介绍的部分。因此,解决该问题将大大提高整体效率。

关于lm()回归

要收集学生化的外部残差,通常先运行lm(),然后运行rstudent()。 R函数使用的方法避免运行 n 回归来估计学生化残差,并节省了大量执行时间。但是,我不想使用lm(),因为我只需要残差而没有随之而来的花哨的额外工作(因此节省了更多的执行时间)。

当尝试解密lm()中的外部残差的R源代码时,我发现它有些模糊,因为它似乎是从其他外部文件中调用示例代码(例如influence()函数)。因此,此时我无法收集足够的信息来仅使用源代码来复制代码段。

堆栈上的相关主题

已在堆栈中找到以下相关主题:How to compute Studentized Residuals in Python?

给出了一个Python过程的R实现,其中包括一个最小的示例(已通过@StéphaneLaurent进行了更正,请参见答案):

n = 10    
set.seed(1)
x = rnorm(n)
y = rnorm(n)
m = 2

mean_y = mean(y)
mean_x = mean(x)

diff_mean_sqr = (y - mean_y) %*% (y - mean_y)

beta_1 = ((y - mean_y) %*% (x - mean_x)) / diff_mean_sqr
beta_0 = mean_x - c(beta_1) * mean_y

x_hat = beta_0 + c(beta_1) * y

residuals = x - x_hat

h_ii = ((y - mean_y) ^ 2) / c(diff_mean_sqr) + (1 / n)
var_e = sqrt(vapply(1:n,function(i){
    fit <- lm.fit(cbind(1,y[-i]),x[-i])
    sum(fit$residuals^2)
    },numeric(1)) / (n-m-1))
SE_regression = var_e * (sqrt(1 - h_ii))

studentized_residuals = residuals / SE_regression

reg = rstudent(lm(x ~ y))
res = cbind(reg,studentized_residuals)

产生以下差异:

index    reg            studentized_residuals
1   -0,595911898846465  -0,581348373714385
2   0,116208945967327   0,116097011762269
3   -2,04779452591111   -1,61939642040734
4   2,26350621688535    1,71995630000724
5   0,603322309518977   0,588222428131761
6   -1,5460639774285    -1,33486217871738
7   0,367900050364855   0,364393996552621
8   1,14745971090533    1,05271762293388
9   0,823888320713653   0,786630743176311
10  -0,449839343257121  -0,443475039943641

最小示例

以下R attemp已使用任意数据集进行了测试,仅用于说明目的。 它使用lm() / rstudent(),对于我们的实际应用来说太慢了。两个参数n1n2分别对应于迭代次数和向量的大小(在上面用 n 表示)。为了解决我们的问题,我们通常选择[10 ^ 6,10 ^ 9]中的n1和[10 ^ 3,10 ^ 6]中的n2

Stud = function(n1,n2){
  res = data.frame(matrix(vector(),n2,n1))

  for(i in 1 : n1){
    x = rnorm(n2)
    y = rnorm(n2)
    reg = lm(x ~ y)
    res[,i] = rstudent(reg)
  }
}

基准测试的更新和其他(完整)最小示例:

在这里,我们展示了一个完整的基准,其中针对lm()测试了Stack的各种功能,目的是收集学生化的外部残差。为了收集这些残差,我们需要运行“ n”回归。在代码进行100和500次复制后给出结果。

#Packages

install.packages("Rcpp")
library(Rcpp)
install.packages("RcppArmadillo")
library(RcppArmadillo)
install.packages("RcppEigen")
library(RcppEigen)
install.packages("stats")
library(stats)
install.packages("speedglm")
library(speedglm)
install.packages("Rfast")
library(Rfast)
install.packages("rbenchmark")
library(rbenchmark)

## start from SEXP,most conversions,longest code
src <- '
Rcpp::List fLmSEXP(SEXP Xs,SEXP ys) {
    Rcpp::NumericMatrix Xr(Xs);
    Rcpp::NumericVector yr(ys);
    int n = Xr.nrow(),k = Xr.ncol();
    arma::mat X(Xr.begin(),n,k,false);
    arma::colvec y(yr.begin(),yr.size(),false);
    int df = n - k;
    // fit model y ~ X,extract residuals
    arma::colvec coef = arma::solve(X,y);
    arma::colvec res  = y - X*coef;
    double s2 = std::inner_product(res.begin(),res.end(),res.begin(),0.0)/df;
    // std.errors of coefficients
    arma::colvec sderr = arma::sqrt(s2 *
       arma::diagvec(arma::pinv(arma::trans(X)*X)));
    return Rcpp::List::create(Rcpp::Named("coefficients")=coef,Rcpp::Named("stderr")      =sderr,Rcpp::Named("df")          =df,Rcpp::Named("residuals")  =res);
}
'
cppFunction(code=src,depends="RcppArmadillo")

## start from Rcpp types are early RcppArmadillo examples did
src <- '
Rcpp::List fLmTwoCasts(Rcpp::NumericMatrix Xr,Rcpp::NumericVector yr) {
    int n = Xr.nrow(),depends="RcppArmadillo")

## start from Armadillo types
src <- '
Rcpp::List fLmOneCast(arma::mat X,arma::colvec y) {
    int df = X.n_rows - X.n_cols;
    // fit model y ~ X,depends="RcppArmadillo")

## start from Armadillo types passed as constant references
src <- '
Rcpp::List fLmConstRef(const arma::mat & X,const arma::colvec & y) {
    int df = X.n_rows - X.n_cols;
    // fit model y ~ X,depends="RcppArmadillo")

#Benchmark

data = benchmark("OneCast" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- fLmOneCast(cbind(1,x[-i])
    sum(fit$residuals^2)
  },numeric(1)) / (n-m-1))
  
  SE_regression = var_e * (sqrt(1 - h_ii))
  
  studentized_residuals = residuals / SE_regression
},"TwoCast" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- fLmTwoCasts(cbind(1,"Const" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- fLmConstRef(cbind(1,"Sexp" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- fLmSEXP(cbind(1,"Fast" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- fastLm(x[-i] ~ y[-i])
    sum(fit$residuals^2)
  },"Speed" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- speedlm(x[-i] ~ y[-i],fitted = T)
    sum((x[-i] - fit$fitted.values)^2)
  },".Fit" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,"Fit" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- lmfit(cbind(1,"Lm" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  m=2
  
  mean_data = mean(y)
  mean_x = mean(x)
  
  diff_mean_sqr = (y - mean_data) %*% (y - mean_data)
  
  beta_1 = ((y - mean_data) %*% (x - mean_x)) / diff_mean_sqr
  beta_0 = mean_x - c(beta_1) * mean_data
  
  x_hat = beta_0 + c(beta_1) * y
  
  residuals = x - x_hat
  
  h_ii = ((y - mean_data) ^ 2) / c(diff_mean_sqr) + (1 / n)
  
  var_e = sqrt(vapply(1:n,function(i){
    fit <- lm(x[-i] ~ y[-i])
    sum(fit$residuals^2)
  },"Basic" = {
  n = 15
  set.seed(1)
  y = rnorm(n)
  x <- rnorm(n)
  reg <- lm(x ~ y)
  reg_stud <- rstudent(reg)
},replications = 500,columns = c("test","elapsed","replications"))

结果: 在这个单一基准上,rstudent(lm())比其他所有内容都快{strong> :

      test elapsed replications
7     .Fit   13.84          100
10   Basic    0.25          100
3    Const    7.37          100
5     Fast   99.84          100
8      Fit    7.06          100
9       Lm  105.25          100
1  OneCast    7.61          100
4     Sexp    7.66          100
6    Speed  184.76          100
2  TwoCast    7.17          100

7     .Fit   63.63          500
10   Basic    0.93          500
3    Const   34.44          500
5     Fast  438.95          500
8      Fit   31.11          500
9       Lm  471.37          500
1  OneCast   34.29          500
4     Sexp   33.48          500
6    Speed  794.73          500
2  TwoCast   33.51          500

解释 似乎R使用了一种分析替代方法,避免使用'n'回归,从而使计算速度更快。 因此,问题仍然存在:如何在rstudent(lm())方面具有竞争力,以及如何反向使用原始源代码(难以收集)?

解决方法

通过将您的var_e替换为

,可以获得相同的结果
var_e = vapply(1:n,function(i){
  sigma(lm(x[-i] ~ y[-i]))
},numeric(1))

要高效地获取该信息,请不要使用lm,而应使用lm.fit

var_e = sqrt(vapply(1:n,function(i){
  fit <- lm.fit(cbind(1,y[-i]),x[-i])
  sum(fit$residuals^2)
},numeric(1)) / (n-m-1))
,

编辑:

编辑是为了表明找到了比之前给出的更快的函数:

)

到目前为止,这个功能非常快。

上一个回答

由于您使用的是 R,您可以使用 fast_rstudent <-function(X,y,intercept = TRUE){ mqr <- .Call(stats:::C_Cdqrls,cbind(intercept,X),tol,FALSE) res <- .Call(stats:::C_influence,mqr,mqr$residuals,1e-12) mqr$residuals/(res$sigma*sqrt(1-res$hat)) } 分解来解决这个问题。您的目标是通过摆脱开销函数调用等编写一个比内置函数更快的 qr 函数。这意味着您应该只使用必要的内部函数。以下是执行此操作的快速方法:

rstudent

在某种程度上,该函数几乎完全移除了开销函数,从而滥用了 R。这假设提供给函数的内容是正确的。


结果:

my_rstudent <- function (X,intercept = TRUE) {
  X <- cbind(intercept,X)
  u <- .Call(stats:::C_Cdqrls,X,1e-7,FALSE)
  d <- dim(X)
  n <- as.integer(d[1L])
  k <- as.integer(d[2L])
  df_res <- n - k
  
  z <- .Internal(diag(1,n,k))
  
  v <- .Fortran(.F_dqrqy,as.double(u$qr),k,as.double(u$qraux),z,qy = z)$qy
  
  h_ii <-.Internal(rowSums(v^2,FALSE))
  
  rstand <- u$residuals/sqrt(sum(u$residuals**2)/df_res)/sqrt(1-h_ii)
  
  rstand * sqrt((df_res - 1)/( df_res - rstand^2))
}

基准:

n = 10    
set.seed(1)
x = rnorm(n)
y = rnorm(n)

cbind(mine=my_rstudent(x,y),from_R=rstudent(lm(y~x)))
          mine      from_R
1   0.92113157  0.92113157
2   0.15753536  0.15753536
3  -1.69587949 -1.69587949
4  -3.59182456 -3.59182456
5   0.98274664  0.98274664
6  -0.85765961 -0.85765961
7  -0.07768369 -0.07768369
8   1.05874766  1.05874766
9   0.80181623  0.80181623
10  0.11418833  0.11418833

对于小数据集,开销函数退出会减慢 rstudent 的计算速度。

相对较大的数据集:

microbenchmark::microbenchmark(my_rstudent(x,rstudent(lm(y~x)),unit="relative",times = 100)
Unit: relative
                expr      min       lq     mean   median       uq      max neval
   my_rstudent(x,y)  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000   100
 rstudent(lm(y ~ x)) 45.36667 37.20755 26.89753 24.29545 22.39587 11.31733   100

庞大的数据集

n = 1000    
set.seed(1)
x = rnorm(n)
y = rnorm(n)
microbenchmark::microbenchmark(my_rstudent(x,y) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100
 rstudent(lm(y ~ x)) 8.530228 8.059269 7.700426 7.848123 7.616909 3.877305   100

非常庞大的数据集

n = 1000000    
set.seed(1)
x = rnorm(n)
y = rnorm(n)
microbenchmark::microbenchmark(my_rstudent(x,times = 10)
Unit: relative
                expr      min       lq     mean   median       uq     max neval
   my_rstudent(x,y) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000    10
 rstudent(lm(y ~ x)) 1.510198 1.560989 1.486083 1.666609 1.603455 1.01154    10
,

我认为解决您的问题的方法是先减少所有必要的函数开销,如果这样做不够快,请尝试将代码转换为C ++并使用Rccp运行它。如果您使用自己的实现而不是像我那样使用.lm.fit来计算lm.fit的残差,那么很有可能可以改善我的结果。

我还检查了,根据您要使用的函数(lmlm.fit.lm.fit),学生化残差是否存在差异,事实证明这是案子。但是,这里我函数的残差等于MASS::studresy ~ x的回归,其中x仅具有一列。

这是我的代码和基准,与上面称为“基本”的最快版本相比:

library(rbenchmark)
library(microbenchmark)
library(MASS)

set.seed(1)

x <- matrix(rnorm(500),ncol = 1)
y <- matrix(rnorm(500),ncol = 1)
myFunc <- function(x,n = 500){
  # tmp <- .lm.fit(x,y) # linear model fit
  object <- lm.fit(x = x,y = y)
  
  resid <- object$residuals
  hat <- lm.influence(object,do.coef = FALSE)$hat
  # hat <- hat[hat > 0] # remove checks
  # ok <- !(is.na(resid)) # remove checks
  # n.miss <- sum(!ok) # remove checks
  
  # resid <- resid[ok] # remove checks
  # n <- length(resid) 
  # p <- object$rank # equal to one
  p <- 1
  rdf <- n - 1
  
  studres <- resid
  
  stddev <- sqrt(sum(resid^2)/rdf)
  sr <- resid/(sqrt(1 - hat) * stddev)
  stdres <- sr
  studres <- sr/sqrt((n - p - sr^2)/(n - p - 1))
  
  studres <- naresid(object$na.action,studres)
  
  return(studres)
}

test1 <- stats::rstudent(lm(x ~ y)) # rstudent doesn't work with lm.fit
test2 <- MASS::studres(lm(x ~ y))
test3 <- MASS::studres(lm.fit(x,y))
test4 <- myFunc(x,n = 500)

> head(cbind(test1,test2,test3,test4))
       test1      test2       test3       test4
1 -0.6368094 -0.6368094  0.04696790  0.04696790
2  0.1493050  0.1493050 -0.27286396 -0.27286396
3 -0.8941217 -0.8941217 -1.15505676 -1.15505676
4  1.5598965  1.5598965  0.07729179  0.07729179
5  0.3440252  0.3440252  0.95155123  0.95155123
6 -0.7714317 -0.7714317  1.47600416  1.47600416

####################################

mbm <- microbenchmark("lm" = {rstudent(lm(y~x)) },"MASS_lm" = {
                          MASS::studres(lm(y~x))
                      },"MASS_lm.fit" = {
                        MASS::studres(lm.fit(x = x,y = y))
                      },"myFunc" = {myFunc(x,n = 500)},times = 100
)

> mbm
Unit: microseconds
        expr     min       lq     mean   median        uq      max neval
          lm 767.001 869.1510 1188.023 977.1505 1185.5010 8279.801   100
     MASS_lm 704.601 909.2000 1085.261 997.3515 1168.8505 2052.202   100
 MASS_lm.fit 168.001 195.0510  282.166 212.9510  254.1015 2912.201   100
      myFunc 147.901 168.8015  234.261 190.0010  249.7515 1193.701   100

请注意,您必须根据向量x或y的长度指定n

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res