微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

纱线容器转换为 RUNNING 状态,但未完成

如何解决纱线容器转换为 RUNNING 状态,但未完成

我正在尝试在我的 5 节点集群中运行 Apache Oozie 书中的第一个示例 (identity-wf) 工作流。

  • hadoop1 / NameNode、资源管理器
  • hadoop2 / SecondaryNameNode
  • hadoop3 / Datanode、NodeManager (3GB RAM)
  • hadoop4 / Datanode、NodeManager (3GB RAM)
  • hadoop5 / Datanode、NodeManager (3GB RAM)
  • hadoop6 / Oozie 服务器

hadoop 版本是 2.10.1 oozie 版本是 5.2.1

工作流.xml

<workflow-app xmlns="uri:oozie:workflow:0.4" name="identity-WF">

  <parameters>
    <property>
      <name>jobTracker</name>
    </property>
    <property>
      <name>nameNode</name>
    </property>
    <property>
      <name>exampleDir</name>
    </property>
  </parameters>

  <start to="identity-MR"/>

  <action name="identity-MR">
    <map-reduce>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${exampleDir}/data/output"/>
      </prepare>
      <configuration>
        <property>
          <name>mapred.mapper.class</name>
          <value>org.apache.hadoop.mapred.lib.IdentityMapper</value>
        </property>
        <property>
          <name>mapred.reducer.class</name>
          <value>org.apache.hadoop.mapred.lib.IdentityReducer</value>
        </property>
        <property>
          <name>mapred.input.dir</name>
          <value>${exampleDir}/data/input</value>
        </property>
        <property>
          <name>mapred.output.dir</name>
          <value>${exampleDir}/data/output</value>
        </property>
        <property>
          <name>oozie.launcher.mapreduce.map.java.opts</name>
          <value>-verbose</value>
        </property>
        <property>
          <name>oozie.launcher.mapreduce.map.memory.mb</name>
          <value>512</value>
        </property>
        <property>
          <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
          <value>512</value>
        </property>
      </configuration>
    </map-reduce>
    <ok to="success"/>
    <error to="fail"/>
  </action>

  <kill name="fail">
    <message>The Identity Map-Reduce job Failed!</message>
  </kill>

  <end name="success"/>

</workflow-app>

容器已创建并转换为 RUNNING 状态,但最终超时。

ResourceManager 的日志

2021-05-10 13:40:36,733 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from NEW to ALLOCATED
2021-05-10 13:40:36,734 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=oozie    OPERATION=AM Allocated Container        TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1620621418881_0001    CONTAINERID=container_1620621418881_0001_01_000001      RESOURCE=<memory:2048,vCores:1>        QUEUENAME=default
2021-05-10 13:40:36,736 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.083333336 absoluteUsedCapacity=0.083333336 used=<memory:2048,vCores:1> cluster=<memory:24576,vCores:24>
2021-05-10 13:40:36,736 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted
2021-05-10 13:40:36,771 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : hadoop3:34683 for container : container_1620621418881_0001_01_000001
2021-05-10 13:40:36,786 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2021-05-10 13:40:36,787 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1620621418881_0001_000001
2021-05-10 13:40:36,787 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1620621418881_0001 AttemptId: appattempt_1620621418881_0001_000001 MasterContainer: Container: [ContainerId: container_1620621418881_0001_01_000001,AllocationRequestId: 0,Version: 0,NodeId: hadoop3:34683,NodeHttpAddress: hadoop3:8042,Resource: <memory:2048,vCores:1>,Priority: 0,Token: Token { kind: ContainerToken,service: 192.168.35.67:34683 },ExecutionType: GUaraNTEED,]
2021-05-10 13:40:36,812 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2021-05-10 13:40:36,830 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2021-05-10 13:40:36,842 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1620621418881_0001_000001
2021-05-10 13:40:36,993 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=oozie    IP=192.168.35.80        OPERATION=Get Applications Request      TARGET=ClientRMService  RESULT=SUCCESS
2021-05-10 13:40:37,035 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1620621418881_0001_01_000001,] for AM appattempt_1620621418881_0001_000001
2021-05-10 13:40:37,037 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1620621418881_0001_000001
2021-05-10 13:40:37,048 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1620621418881_0001_000001
2021-05-10 13:40:37,619 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1620621418881_0001_01_000001,619 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2021-05-10 13:40:37,620 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the launch time for applicationId: application_1620621418881_0001,attemptId: appattempt_1620621418881_0001_000001launchTime: 1620621637619
2021-05-10 13:40:37,620 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1620621418881_0001
2021-05-10 13:40:37,704 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from ACQUIRED to RUNNING
2021-05-10 13:46:44,212 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Release request cache is cleaned up
2021-05-10 13:51:35,713 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=oozie    IP=192.168.35.80        OPERATION=Get Applications Request      TARGET=ClientRMService  RESULT=SUCCESS
2021-05-10 13:53:39,032 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:appattempt_1620621418881_0001_000001 Timed out after 600 secs
2021-05-10 13:53:39,034 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1620621418881_0001_000001 with final state: Failed,and exit status: -1000
2021-05-10 13:53:39,035 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from LAUNCHED to FINAL_SAVING on event = EXPIRE
2021-05-10 13:53:39,036 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1620621418881_0001_000001
2021-05-10 13:53:39,036 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished,removing password for appattempt_1620621418881_0001_000001
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from FINAL_SAVING to Failed on event = ATTEMPT_UPDATE_SAVED
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of Failed attempts is 1. The max attempts is 2
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1620621418881_0001_000002
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000002 State change from NEW to SUBMITTED on event = START
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1620621418881_0001_000001 is done. finalState=Failed
2021-05-10 13:53:39,042 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from RUNNING to KILLED

容器日志

log4j: Trying to find [container-log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@7852e922.
log4j: Using URL [jar:file:/opt/hadoop-2.10.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.10.1.jar!/container-log4j.properties] for automatic log4j configuration.
log4j: Reading configuration from URL jar:file:/opt/hadoop-2.10.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.10.1.jar!/container-log4j.properties
log4j: Hierarchy threshold set to [ALL].
log4j: Parsing for [root] with value=[INFO,CLA,EventCounter].
log4j: Level token is [INFO].
log4j: Category root set to INFO
log4j: Parsing appender named "CLA".
log4j: Parsing layout options for "CLA".
log4j: Setting property [conversionPattern] to [%d{ISO8601} %p [%t] %c: %m%n].
log4j: End of parsing for "CLA".
log4j: Setting property [containerLogFile] to [syslog].
log4j: Setting property [totalLogFileSize] to [1048576].
log4j: Setting property [containerLogDir] to [/var/hadoop/yarn/logs/userlogs/application_1620621418881_0002/container_1620621418881_0002_01_000001].
log4j: setFile called: /var/hadoop/yarn/logs/userlogs/application_1620621418881_0002/container_1620621418881_0002_01_000001/syslog,true
log4j: setFile ended
log4j: Parsed "CLA" options.
log4j: Parsing appender named "EventCounter".
log4j: Parsed "EventCounter" options.
log4j: Parsing for [org.apache.hadoop.mapreduce.task.reduce] with value=[INFO,CLA].
log4j: Level token is [INFO].
log4j: Category org.apache.hadoop.mapreduce.task.reduce set to INFO
log4j: Parsing appender named "CLA".
log4j: Appender "CLA" was already parsed.
log4j: Handling log4j.additivity.org.apache.hadoop.mapreduce.task.reduce=[false]
log4j: Setting additivity for "org.apache.hadoop.mapreduce.task.reduce" to false
log4j: Parsing for [org.apache.hadoop.mapred.Merger] with value=[INFO,CLA].
log4j: Level token is [INFO].
log4j: Category org.apache.hadoop.mapred.Merger set to INFO
log4j: Parsing appender named "CLA".
log4j: Appender "CLA" was already parsed.
log4j: Handling log4j.additivity.org.apache.hadoop.mapred.Merger=[false]
log4j: Setting additivity for "org.apache.hadoop.mapred.Merger" to false
log4j: Finished configuring.
Launcher AM configuration loaded
Executing Oozie Launcher with tokens:
Kind: YARN_AM_RM_TOKEN,Service:,Ident: (appAttemptId { application_id { id: 2 cluster_timestamp: 1620621418881 } attemptId: 1 } keyId: -1917584279

我怎样才能找出问题所在? 谢谢

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。