Spark-Submit remote to EMR Kerberos 更新令牌失败:种类:HDFS_DELEGATION_TOKEN,服务:

如何解决Spark-Submit remote to EMR Kerberos 更新令牌失败:种类:HDFS_DELEGATION_TOKEN,服务:

我将 EMR-6.0.0 与 Kerberos、Spark 2.4.4 和 Amazon 3.2.1 一起使用,只有一个 master。

我正在尝试使用 spark-submit 向 Spark 提交远程作业,无论我尝试什么,我都不断收到下一个异常:

 Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1614359247442_0018 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN,Service: 3.92.137.244:8020,Ident: (token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY,renewer=yarn,realUser=,issueDate=1614442935775,maxDate=1615047735775,sequenceNumber=28,masterKeyId=2)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:327)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1134)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)

我能够看到作业到达我的资源管理器,但是一旦它到达,它就会在新状态下等待接下来的 2 分钟:

21/02/27 09:24:18 INFO YarnClientImpl: Application submission is not finished,submitted application application_1614359247442_0018 is still in NEW

然后它就失败了:

Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1614359247442_0018 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN,masterKeyId=2)

我使用过的一些 spark-submit 命令是:

./bin/spark-submit -v --master yarn --deploy-mode cluster --executor-memory 512MB --total-executor-cores 10 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --conf spark.ego.uname=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --conf spark.ego.keytab=/etc/hadoop.keytab   --principal hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --keytab /etc/hadoop.keytab $SPARK_HOME/examples/src/main/python/pi.py

我也使用过这个变体(使用 --conf spark.ego.keytab):

./bin/spark-submit -v --master yarn --deploy-mode cluster --executor-memory 512MB --total-executor-cores 10 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --conf spark.ego.uname=hadoop/ip-172-31-89-107.ec2.internal@MODELOP --conf spark.ego.keytab=/etc/hadoop.keytab   --principal hadoop/ip-172-31-89-107.ec2.internal@MODELOP --keytab /etc/hadoop.keytab $SPARK_HOME/examples/src/main/python/pi.py

完整的执行日志:

miguelr@Miguels-MacBook-Pro-2 spark-2.4.4_emr6_kerberos % ./bin/spark-submit -v --master yarn --deploy-mode cluster --executor-memory 512MB --total-executor-cores 10 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --conf spark.ego.uname=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --conf spark.ego.keytab=/etc/hadoop.keytab   --principal hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --keytab /etc/hadoop.keytab $SPARK_HOME/examples/src/main/python/pi.py
Using properties file: /opt/sw/spark-2.4.4_emr6_kerberos/conf/spark-defaults.conf
21/02/27 09:21:57 WARN Utils: Your hostname,Miguels-MacBook-Pro-2.local resolves to a loopback address: 127.0.0.1; using 192.168.0.14 instead (on interface en0)
21/02/27 09:21:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Adding default property: spark.sql.warehouse.dir=hdfs:///user/spark/warehouse
Adding default property: spark.yarn.dist.files=/etc/spark/conf/hive-site.xml
Adding default property: spark.history.kerberos.keytab=/etc/spark.keytab
Adding default property: spark.sql.parquet.fs.optimized.committer.optimization-enabled=true
Adding default property: spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError
Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps
Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem=2
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.yarn.historyServer.address=ip-172-31-22-222.ec2.internal:18080
Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
Adding default property: spark.driver.memory=2048M
Adding default property: spark.files.fetchFailure.unRegisterOutputOnHost=true
Adding default property: spark.history.kerberos.principal=spark/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY
Adding default property: spark.resourceManager.cleanupExpiredHost=true
Adding default property: spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw
Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f)
Adding default property: spark.sql.emr.internal.extensions=com.amazonaws.emr.spark.EmrSparkSessionExtensions
Adding default property: spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError
Adding default property: spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds=50000
Adding default property: spark.master=yarn
Adding default property: spark.sql.parquet.output.committer.class=com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
Adding default property: spark.blacklist.decommissioning.timeout=1h
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2
Adding default property: spark.executor.memory=4743M
Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
Adding default property: spark.executor.cores=2
Adding default property: spark.history.ui.port=18080
Adding default property: spark.blacklist.decommissioning.enabled=true
Adding default property: spark.history.kerberos.enabled=true
Adding default property: spark.decommissioning.timeout.threshold=20
Adding default property: spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro,/mnt1/s3:/mnt1/s3:rw
Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem=true
Adding default property: spark.hadoop.yarn.timeline-service.enabled=false
Adding default property: spark.yarn.executor.memoryOverheadFactor=0.1875
Parsed arguments:
  master                  yarn
  deployMode              cluster
  executorMemory          512MB
  executorCores           2
  totalExecutorCores      10
  propertiesFile          /opt/sw/spark-2.4.4_emr6_kerberos/conf/spark-defaults.conf
  driverMemory            2048M
  driverCores             null
  driverExtraClassPath    /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
  driverExtraLibraryPath  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
  driverExtraJavaOptions  -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               null
  primaryResource         file:/opt/sw/spark-2.4.4_emr6_kerberos/examples/src/main/python/pi.py
  name                    pi.py
  childArgs               []
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used,including those specified through
 --conf and those from the properties file /opt/sw/spark-2.4.4_emr6_kerberos/conf/spark-defaults.conf:
  (spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions)
  (spark.history.kerberos.enabled,true)
  (spark.blacklist.decommissioning.timeout,1h)
  (spark.yarn.executor.memoryOverheadFactor,0.1875)
  (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
  (spark.history.kerberos.principal,spark/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
  (spark.blacklist.decommissioning.enabled,true)
  (spark.hadoop.fs.hdfs.impl.disable.cache,true)
  (spark.hadoop.yarn.timeline-service.enabled,false)
  (spark.driver.memory,2048M)
  (spark.executor.memory,4743M)
  (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
  (spark.sql.parquet.fs.optimized.committer.optimization-enabled,true)
  (spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
  (spark.yarn.historyServer.address,ip-172-31-22-222.ec2.internal:18080)
  (spark.ego.uname,hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
  (spark.eventLog.enabled,true)
  (spark.yarn.dist.files,/etc/spark/conf/hive-site.xml)
  (spark.files.fetchFailure.unRegisterOutputOnHost,true)
  (spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/etc/passwd:/etc/passwd:ro,/mnt1/s3:/mnt1/s3:rw)
  (spark.history.ui.port,18080)
  (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
  (spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,50000)
  (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
  (spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/mnt1/s3:/mnt1/s3:rw)
  (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
  (spark.resourceManager.cleanupExpiredHost,true)
  (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
  (spark.shuffle.service.enabled,true)
  (spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
  (spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2)
  (spark.history.kerberos.keytab,/etc/spark.keytab)
  (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
  (spark.eventLog.dir,hdfs:///var/log/spark/apps)
  (spark.master,yarn)
  (spark.dynamicAllocation.enabled,true)
  (spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter)
  (spark.ego.keytab,/etc/hadoop.keytab)
  (spark.executor.cores,2)
  (spark.decommissioning.timeout.threshold,20)
  (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar)
  (spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true)

    
21/02/27 09:21:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Main class:
org.apache.spark.deploy.yarn.YarnClusterApplication
Arguments:
--primary-py-file
file:/opt/sw/spark-2.4.4_emr6_kerberos/examples/src/main/python/pi.py
--class
org.apache.spark.deploy.PythonRunner
Spark config:
(spark.yarn.keytab,/etc/hadoop.keytab)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml)
(spark.history.kerberos.keytab,/etc/spark.keytab)
(spark.sql.parquet.fs.optimized.committer.optimization-enabled,true)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2)
(spark.eventLog.enabled,true)
(spark.shuffle.service.enabled,true)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.hadoop.fs.hdfs.impl.disable.cache,true)
(spark.yarn.historyServer.address,ip-172-31-22-222.ec2.internal:18080)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.app.name,pi.py)
(spark.yarn.principal,hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.driver.memory,2048M)
(spark.files.fetchFailure.unRegisterOutputOnHost,true)
(spark.history.kerberos.principal,spark/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.ego.keytab,/etc/hadoop.keytab)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/mnt1/s3:/mnt1/s3:rw)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions)
(spark.ego.uname,hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
(spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,50000)
(spark.submit.deployMode,cluster)
(spark.master,yarn)
(spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter)
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.executor.memory,512MB)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.dynamicAllocation.enabled,true)
(spark.executor.cores,2)
(spark.history.ui.port,18080)
(spark.yarn.isPython,true)
(spark.blacklist.decommissioning.enabled,true)
(spark.history.kerberos.enabled,true)
(spark.decommissioning.timeout.threshold,20)
(spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/mnt1/s3:/mnt1/s3:rw)
(spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.yarn.executor.memoryOverheadFactor,0.1875)
Classpath elements:



Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/27 09:21:58 INFO Client: Kerberos credentials: principal = hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY,keytab = /etc/hadoop.keytab
21/02/27 09:21:58 INFO RMProxy: Connecting to ResourceManager at ip-172-31-22-222.ec2.internal/3.92.137.244:8032
21/02/27 09:21:59 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/02/27 09:21:59 INFO Configuration: resource-types.xml not found
21/02/27 09:21:59 INFO ResourceUtils: Unable to find 'resource-types.xml'.
21/02/27 09:21:59 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/02/27 09:21:59 INFO Client: Will allocate AM container,with 2432 MB memory including 384 MB overhead
21/02/27 09:21:59 INFO Client: Setting up container launch context for our AM
21/02/27 09:21:59 INFO Client: Setting up the launch environment for our AM container
21/02/27 09:21:59 INFO Client: Preparing resources for our AM container
21/02/27 09:22:00 INFO Client: To enable the AM to login from keytab,credentials are being copied over to the AM via the YARN Secure Distributed Cache.
21/02/27 09:22:00 INFO Client: Uploading resource file:/etc/hadoop.keytab -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/hadoop.keytab
21/02/27 09:22:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set,falling back to uploading libraries under SPARK_HOME.
21/02/27 09:22:03 INFO Client: Uploading resource file:/private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-7fdb5051-41b3-4d5d-b168-a90b09682f58/__spark_libs__7065880233350683192.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/__spark_libs__7065880233350683192.zip
21/02/27 09:22:10 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/hive-site.xml
21/02/27 09:22:11 INFO Client: Uploading resource file:/opt/sw/spark-2.4.4_emr6_kerberos/examples/src/main/python/pi.py -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/pi.py
21/02/27 09:22:12 INFO Client: Uploading resource file:/opt/sw/spark-2.4.4_emr6_kerberos/python/lib/pyspark.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/pyspark.zip
21/02/27 09:22:13 INFO Client: Uploading resource file:/opt/sw/spark-2.4.4_emr6_kerberos/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/py4j-0.10.7-src.zip
21/02/27 09:22:14 INFO Client: Uploading resource file:/private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-7fdb5051-41b3-4d5d-b168-a90b09682f58/__spark_conf__3455622681190730836.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/__spark_conf__.zip
21/02/27 09:22:15 INFO SecurityManager: Changing view acls to: miguelr,hadoop
21/02/27 09:22:15 INFO SecurityManager: Changing modify acls to: miguelr,hadoop
21/02/27 09:22:15 INFO SecurityManager: Changing view acls groups to: 
21/02/27 09:22:15 INFO SecurityManager: Changing modify acls groups to: 
21/02/27 09:22:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(miguelr,hadoop); groups with view permissions: Set(); users  with modify permissions: Set(miguelr,hadoop); groups with modify permissions: Set()
21/02/27 09:22:15 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-160461870_1,ugi=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY (auth:KERBEROS)]]
21/02/27 09:22:15 INFO DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY,masterKeyId=2 on 3.92.137.244:8020
21/02/27 09:22:16 INFO KMSClientProvider: New token created: (Kind: kms-dt,Service: kms://http@ip-172-31-22-222.ec2.internal:9600/kms,Ident: (kms-dt owner=hadoop,issueDate=1614442936371,maxDate=1615047736371,masterKeyId=2))
21/02/27 09:22:16 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-160461870_1,ugi=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY (auth:KERBEROS)]]
21/02/27 09:22:16 INFO DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY,renewer=hadoop,issueDate=1614442936617,maxDate=1615047736617,sequenceNumber=29,issueDate=1614442937045,maxDate=1615047737045,masterKeyId=2))
21/02/27 09:22:16 INFO HadoopFSDelegationTokenProvider: Renewal interval is 86400495 for token HDFS_DELEGATION_TOKEN
21/02/27 09:22:17 INFO HadoopFSDelegationTokenProvider: Renewal interval is 86400511 for token kms-dt
21/02/27 09:22:17 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
21/02/27 09:22:18 INFO Client: Submitting application application_1614359247442_0018 to ResourceManager
21/02/27 09:22:21 INFO YarnClientImpl: Application submission is not finished,submitted application application_1614359247442_0018 is still in NEW
21/02/27 09:24:18 INFO YarnClientImpl: Application submission is not finished,submitted application application_1614359247442_0018 is still in NEW
21/02/27 09:24:19 INFO Client: Deleted staging directory hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018
Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1614359247442_0018 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN,masterKeyId=2)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:327)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1134)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/02/27 09:24:19 INFO ShutdownHookManager: Shutdown hook called
21/02/27 09:24:19 INFO ShutdownHookManager: Deleting directory /private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-7fdb5051-41b3-4d5d-b168-a90b09682f58
21/02/27 09:24:19 INFO ShutdownHookManager: Deleting directory /private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-a45d84f6-f1c4-4b06-897a-70cb77aab91e

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res