如何解决具有并行执行的 bash 脚本
我正在尝试在 bash 脚本中使用 parallel
来验证 s3 路径是否存在,并且我正在尝试通过计算路径中的对象来验证多个 s3 路径。如果对象的计数为零,它将继续到 for 循环中的下一个日期,而 parallel 它不会按预期工作。
对于我在 for
循环中提供的日期范围,我们实际上在 s3bucket 中没有这些文件夹,并且在函数 checkS3Path
中,如果 s3 路径不存在,我正在创建一个 0KB 文件,但我没有看到脚本执行后创建的那些 0KB 文件。从脚本的输出中,我看到的是 S3 Path Consists CSV Files,Proceeding to next step folder1:+2019-10-03
,而不是 S3 Path Doesnt Exists folder1:+2019-10-03
。请参阅下面的输出。
请告诉我可能是什么问题。
这是示例代码。
#!/bin/bash
#set -x
s3Bucket=testbucket
version=v20
Array=(folder1 folder2 folder3)
checkS3Path() {
fldName=$1
date=$2
objectNum=$(aws s3 ls s3://${s3Bucket}/${version}/${fldName}/date=${date}/ | wc -l)
echo $objectNum
if [ "$objectNum" -eq 0 ]
then
echo "S3 Path Doesnt Exists ${fldName}:${date}" >> /app/${fldName}.log
touch /home/ubuntu/${fldName}_${date}.txt
continue
else
echo "S3 Path Consists csv Files,Proceeding to next step ${fldName}:${date}"
fi
}
final() {
fldName=$1
date=$2
checkS3Path $fldName $date
function2 $fldName $date
function3 $fldName $date
}
export -f final checkS3Path
for date in 2019-10-{01..03}
do
# finalstep folder1 $date
parallel --jobs 4 --eta finalstep ::: "${Array[@]}" ::: +"$date"
done
这是我看到的输出。
$ ./test.sh
Academic Tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,;login: The USENIX Magazine,February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / cpu cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder1:+2019-10-01
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/2.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder2:+2019-10-01
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/1.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder3:+2019-10-01
Academic Tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,Proceeding to next step folder1:+2019-10-02
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/0.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder2:+2019-10-02
ETA: 6s Left: 12 AVG: 0.50s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files,Proceeding to next step folder3:+2019-10-02
ETA: 3s Left: 11 AVG: 0.33s local:4/3/100%/0.3s 202
Academic Tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,Proceeding to next step folder1:+2019-10-03
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/1.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder2:+2019-10-03
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files,Proceeding to next step folder3:+2019-10-03
ETA: 0s Left: 11 AVG: 0.00s local:4/3/100%/0.3s 202
$
谢谢
解决方法
如果 checkS3Path
在手动运行时有效,那么您可能只需要:
export s3Bucket=testbucket
export version=v20
每个 GNU Parallel 作业都在其自己的 shell(从 Perl 启动)中运行,这就是您需要导出变量的原因,如果您希望它们对作业可见。
另请参阅 env_parallel
以自动执行此操作。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。