微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

“错误的输入路径”在单节点 EC2 实例上设置一个简单的 MRJob

如何解决“错误的输入路径”在单节点 EC2 实例上设置一个简单的 MRJob

我正在尝试使用 Hadoop 和 mrjob 在 Python 中运行一个简单的字数统计程序。我在单个 t2.micro EC2 实例上安装了伪分布式 Hadoop 2.7.3。程序运行如下:

python mr_word_count.py -r hadoop hdfs:///user/ubuntu/input/lorem.txt  -o output

但它失败并出现以下错误

Using configs in /home/ubuntu/.mrjob.conf
Looking for hadoop binary in /home/ubuntu/hadoop/hadoop-2.7.3/bin...
Found hadoop binary: /home/ubuntu/hadoop/hadoop-2.7.3/bin/hadoop
Using Hadoop version 2.7.3
Creating temp directory /tmp/mr_word_count.ubuntu.20210403.013125.236375
uploading working dir files to hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd...
copying other local files to hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/
Running step 1 of 1...
  Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  session.id is deprecated. Instead,use dfs.metrics.session-id
  Initializing JVM Metrics with processName=JobTracker,sessionId=
  Cannot initialize JVM Metrics with processName=JobTracker,sessionId= - already initialized
  Cleaning up the staging area file:/tmp/mapred/staging/ubuntu1155540475/.staging/job_local1155540475_0001
  Error launching job,bad input path : File does not exist: /tmp/mapred/staging/ubuntu1155540475/.staging/job_local1155540475_0001/files/mr_word_count.py#mr_word_count.py
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 Failed: Command '['/home/ubuntu/hadoop/hadoop-2.7.3/bin/hadoop','jar','/home/ubuntu/hadoop/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar','-files','hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd/mr_word_count.py#mr_word_count.py,hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd/setup-wrapper.sh#setup-wrapper.sh','-input','hdfs:///user/ubuntu/input/lorem.txt','-output','hdfs:///user/ubuntu/output','-mapper','/bin/sh -ex setup-wrapper.sh python3 mr_word_count.py --step-num=0 --mapper','-combiner','/bin/sh -ex setup-wrapper.sh python3 mr_word_count.py --step-num=0 --combiner','-reducer','/bin/sh -ex setup-wrapper.sh python3 mr_word_count.py --step-num=0 --reducer']' returned non-zero exit status 512.

似乎跑步者应该将我的程序复制到 /tmp/mapred/staging/,但不是,所以我怀疑我在某处丢失了配置。 Python代码只是本地的,输入文件在HDFS中。

在这里看到了一堆几乎相同的错误问题(特别是 thisthis),但是对配置 xml 的任何更改都没有修复错误。如果我在本地 (-r local) 或内联 (-r inline) 模式下运行它,它会起作用,但不能在 Hadoop 运行器 (-r hadoop) 下运行。

这是我要运行的程序:https://gist.github.com/k4v/5d0d1425977fe7e228e7a1e538f72d68

Hadoop 配置文件

正在运行以下进程:

$ jps
23283 Jps
21846 NodeManager
21545 SecondaryNameNode
21674 ResourceManager
21325 Datanode
21149 NameNode

请帮助找出我遗漏了什么。谢谢。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。