微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

具有并行执行的 bash 脚本

如何解决具有并行执行的 bash 脚本

我正在尝试在 bash 脚本中使用 parallel 来验证 s3 路径是否存在,并且我正在尝试通过计算路径中的对象来验证多个 s3 路径。如果对象的计数为零,它将继续到 for 循环中的下一个日期,而 parallel 它不会按预期工作。

对于我在 for 循环中提供的日期范围,我们实际上在 s3bucket 中没有这些文件夹,并且在函数 checkS3Path 中,如果 s3 路径不存在,我正在创建一个 0KB 文件,但我没有看到脚本执行后创建的那些 0KB 文件。从脚本的输出中,我看到的是 S3 Path Consists CSV Files,Proceeding to next step folder1:+2019-10-03,而不是 S3 Path Doesnt Exists folder1:+2019-10-03。请参阅下面的输出

请告诉我可能是什么问题。

这是示例代码

#!/bin/bash
#set -x
s3Bucket=testbucket
version=v20
Array=(folder1 folder2 folder3)

checkS3Path() {
  fldName=$1
  date=$2
  objectNum=$(aws s3 ls s3://${s3Bucket}/${version}/${fldName}/date=${date}/ | wc -l)
  echo $objectNum
  if [ "$objectNum" -eq  0 ]
  then
    echo "S3 Path Doesnt Exists ${fldName}:${date}" >> /app/${fldName}.log
    touch /home/ubuntu/${fldName}_${date}.txt
    continue
  else
    echo "S3 Path Consists csv Files,Proceeding to next step ${fldName}:${date}"
  fi
}

final() {
  fldName=$1
  date=$2
  checkS3Path $fldName $date
  function2 $fldName $date
  function3 $fldName $date
}

export -f final checkS3Path

for date in 2019-10-{01..03}
do
#  finalstep folder1 $date
  parallel --jobs 4 --eta finalstep ::: "${Array[@]}" ::: +"$date"
done

这是我看到的输出

$ ./test.sh
Academic Tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,;login: The USENIX Magazine,February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.


Computers / cpu cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s  local:4/0/100%/0.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder1:+2019-10-01
ETA: 0s Left: 13 AVG: 0.00s  local:4/1/100%/2.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder2:+2019-10-01
ETA: 0s Left: 12 AVG: 0.00s  local:4/2/100%/1.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder3:+2019-10-01
Academic Tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,Proceeding to next step folder1:+2019-10-02
ETA: 0s Left: 13 AVG: 0.00s  local:4/1/100%/0.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder2:+2019-10-02
ETA: 6s Left: 12 AVG: 0.50s  local:4/2/100%/0.5s 202
S3 Path Consists CSV Files,Proceeding to next step folder3:+2019-10-02
ETA: 3s Left: 11 AVG: 0.33s  local:4/3/100%/0.3s 202
Academic Tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,Proceeding to next step folder1:+2019-10-03
ETA: 0s Left: 13 AVG: 0.00s  local:4/1/100%/1.0s 202
S3 Path Consists CSV Files,Proceeding to next step folder2:+2019-10-03
ETA: 0s Left: 12 AVG: 0.00s  local:4/2/100%/0.5s 202
S3 Path Consists CSV Files,Proceeding to next step folder3:+2019-10-03
ETA: 0s Left: 11 AVG: 0.00s  local:4/3/100%/0.3s 202

$

谢谢

解决方法

如果 checkS3Path 在手动运行时有效,那么您可能只需要:

export s3Bucket=testbucket
export version=v20

每个 GNU Parallel 作业都在其自己的 shell(从 Perl 启动)中运行,这就是您需要导出变量的原因,如果您希望它们对作业可见。

另请参阅 env_parallel 以自动执行此操作。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。