微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Map Reduce Job Stuck在等待AM容器被分配,启动和向RM注册时等待

如何解决Map Reduce Job Stuck在等待AM容器被分配,启动和向RM注册时等待

s3distcp作业在下一行后卡住,应用程序日志如所附图片所示。

2020-10-13 15:00:24,983 INFO s3distcp.S3distCp: AmazonS3Client setEndpoint s3.amazonaws.com
2020-10-13 15:00:25,430 INFO s3distcp.FileInfoListing: opening new file: hdfs:/tmp/4cb50b7d-0f5b-4cfa-a233-a01aa0b08d29/files/1
2020-10-13 15:00:25,531 INFO s3distcp.S3distCp: Created 1 files to copy 26 files 
2020-10-13 15:00:25,748 INFO s3distcp.S3distCp: Reducer number: 16
2020-10-13 15:00:25,991 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-2-237.us-east-2/172.31.2.237:8032
2020-10-13 15:00:26,209 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-2-237:10200
2020-10-13 15:00:26,323 INFO mapreduce.JobResourceUploader: disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1602590975347_0002
2020-10-13 15:00:26,700 INFO input.FileInputFormat: Total input files to process : 1
2020-10-13 15:00:26,740 INFO mapreduce.JobSubmitter: number of splits:1
2020-10-13 15:00:26,888 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1602590975347_0002
2020-10-13 15:00:26,890 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-10-13 15:00:27,092 INFO conf.Configuration: resource-types.xml not found
2020-10-13 15:00:27,093 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-10-13 15:00:27,162 INFO impl.YarnClientImpl: Submitted application application_1602590975347_0002
2020-10-13 15:00:27,206 INFO mapreduce.Job: The url to track the job: http://ip-172-31-2-237:20888/proxy/application_1602590975347_0002/
2020-10-13 15:00:27,207 INFO mapreduce.Job: Running job: job_1602590975347_0002

节点管理器状态如下:

● hadoop-yarn-nodemanager.service - Hadoop nodemanager
   Loaded: loaded (/etc/systemd/system/hadoop-yarn-nodemanager.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-10-13 15:00:02 UTC; 41s ago
  Process: 3285 ExecStart=/etc/init.d/hadoop-yarn-nodemanager start (code=exited,status=0/SUCCESS)
 Main PID: 3360 (java)
    Tasks: 0
   Memory: 2.1M
   CGroup: /system.slice/hadoop-yarn-nodemanager.service
           ‣ 3360 /etc/alternatives/jre/bin/java -Dproc_nodemanager -Djava.net.preferIPv4Stack=true -Dsun.net.inetaddr.ttl=30 -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop-yarn-nodemanager-...

Oct 13 14:59:55 ip-172-31-2-236 systemd[1]: Starting Hadoop nodemanager...
Oct 13 14:59:55 ip-172-31-2-236 su[3313]: (to yarn) root on none
Oct 13 14:59:55 ip-172-31-2-236 hadoop-yarn-nodemanager[3285]: WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Oct 13 15:00:02 ip-172-31-2-236 hadoop-yarn-nodemanager[3285]: Started Hadoop nodemanager:[  OK  ]
Oct 13 15:00:02 ip-172-31-2-236 systemd[1]: Started Hadoop nodemanager.

名称节点和数据节点中的jps输出如下:

--------------Datanode--------
17063 ApplicationHistoryServer
17576 LivyServer
6665 Main
21513 ResourceManager
28682 RunJar
21261 NameNode
16366 HttpFSServerWebServer
22062 JobHistoryServer
6446 Main
18449 HistoryServer
7412 Jps
6461 Main
15581 KMSWebServer
21918 WebAppProxyServer

--------------Datanode--------
3360 NodeManager
7475 Main
7763 Main
3977 Jps
7465 Main
30974 Datanode

yarn node -list --all命令不会显示任何健康或不健康的节点,也无法在Web ui中看到该节点,但是DFS运行状况报告如下:

[hadoop@ip-172-31-2-237 ~]$ hdfs dfsadmin -report
Configured Capacity: 540456443904 (503.34 GB)
Present Capacity: 512874729472 (477.65 GB)
DFS Remaining: 512860467200 (477.64 GB)
DFS Used: 14262272 (13.60 MB)
DFS Used%: 0.00%
Replicated Blocks:
    Under replicated blocks: 22
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Low redundancy blocks with highest priority to recover: 22
    Pending deletion blocks: 0
Erasure Coded Block Groups: 
    Low redundancy block groups: 0
    Block groups with corrupt internal blocks: 0
    Missing block groups: 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (1):

Name: 172.31.2.236:9866 (ip-172-31-2-236)
Hostname: ip-172-31-2-236
Decommission Status : normal
Configured Capacity: 540456443904 (503.34 GB)
DFS Used: 14262272 (13.60 MB)
Non DFS Used: 77148160 (73.57 MB)
DFS Remaining: 512860467200 (477.64 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 13 15:13:03 UTC 2020
Last Block Report: Tue Oct 13 14:20:57 UTC 2020
Num of Blocks: 29

此外,即使nodemanager状态显示它正在运行,在nodemanager启动后应打开的端口8042也不会打开。我尝试了不同的解决方案,但运气不佳。请帮忙。

感谢,

Application Log

Nodes

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。