微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用python和subprocess,Pipe,Popen从hdfs读取/写入文件会出错

我试图在python脚本中读取(打开)和写入hdfs中的文件.但有错误.谁能告诉我这里有什么问题.

代码(完整):sample.py

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

print "After Loop 2"
for line in cat.stdout:
    line += "Blah"
    print line
    print "Inside Loop"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

当我执行:

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

它正确执行我找不到应该在hdfs modifiedfile中创建的文件

当我执行时:

 hadoop fs -getmerge ./fileRead/ file.txt

在file.txt里面,我得到了:

Before Loop 
Before Loop 
After Loop 1    
After Loop 1    
After Loop 2    
After Loop 2

有人可以告诉我这里我做错了什么?我不认为它从sample.txt中读取

解决方法:

尝试更改put子进程以通过更改它来自行获取cat stdout

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

进入这个

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)

完整脚本:

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)
put.communicate()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐