如何解决使用动态
我在使用 snakemake 时遇到了一些奇怪的事情。这是一个简单的例子来向您展示问题。 以下蛇文件有效(sample1.txt 和 sample2.txt 是任何小文本文件):
samples = ['sample1','sample2']
rule end:
input:
merged = expand("{sample}_merged.txt",sample=samples)
rule blocking:
output:
blocking_input = "blocking_file.txt"
shell:
"echo 'blocking' > {output.blocking_input}"
rule split:
input:
text_file = "{sample}.txt",blocking_input = "blocking_file.txt"
output:
splitted_file = dynamic("{sample}_cut_{part}")
params:
prefix = "{sample}_cut_"
shell:
"split -l 3 {input.text_file} {params.prefix}"
rule rename:
input:
splitted_file = "{sample}_cut_{part}"
output:
renamed = "{sample}_renamed_{part}"
shell:
"mv {input.splitted_file} {output.renamed}"
rule merge:
input:
splitted_file = dynamic("{sample}_renamed_{part}")
output:
merged = "{sample}_merged.txt"
params:
prefix = "{sample}_renamed_"
shell:
"cat {params.prefix}* > {output.merged}"
但如果我需要规则 rename
的文件“blocking_file.txt”,则工作流不会创建此文件并停止而不会出现任何错误:
samples = ['sample1',sample=samples)
rule blocking:
output:
blocking_input = "blocking_file.txt"
shell:
"echo 'blocking' > {output.blocking_input}"
rule split:
input:
text_file = "{sample}.txt"
output:
splitted_file = dynamic("{sample}_cut_{part}")
params:
prefix = "{sample}_cut_"
shell:
"split -l 3 {input.text_file} {params.prefix}"
rule rename:
input:
splitted_file = "{sample}_cut_{part}",blocking_input = "blocking_file.txt"
output:
renamed = "{sample}_renamed_{part}"
shell:
"mv {input.splitted_file} {output.renamed}"
rule merge:
input:
splitted_file = dynamic("{sample}_renamed_{part}")
output:
merged = "{sample}_merged.txt"
params:
prefix = "{sample}_renamed_"
shell:
"cat {params.prefix}* > {output.merged}"
[]$ workflow : snakemake -s bug_block.rules -c1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
-------- ------- ------------- -------------
blocking 1 1 1
end 1 1 1
merge 2 1 1
rename 2 1 1
split 2 1 1
total 8 1 1
Select jobs to execute...
[Thu Jul 22 17:08:14 2021]
rule split:
input: sample2.txt
output: sample2_cut_{*} (dynamic)
jobid: 7
wildcards: sample=sample2
resources: tmpdir=/tmp
Subsequent jobs will be added dynamically depending on the output of this job
Dynamically updating jobs
[Thu Jul 22 17:08:14 2021]
Finished job 7.
1 of 11 steps (9%) done
Select jobs to execute...
[Thu Jul 22 17:08:14 2021]
rule split:
input: sample1.txt
output: sample1_cut_{*} (dynamic)
jobid: 3
wildcards: sample=sample1
resources: tmpdir=/tmp
Subsequent jobs will be added dynamically depending on the output of this job
Dynamically updating jobs
[Thu Jul 22 17:08:14 2021]
Finished job 3.
2 of 13 steps (15%) done
Complete log: ...
我觉得 DAG 没问题。
感谢您的建议,我设法使用检查点使其运行:
samples = ['sample1','sample2']
rule final_output:
input:
merged = expand("{sample}_merged.txt",sample=samples)
# split each file into several ones
checkpoint split:
input:
text_file = "{sample}.txt"
output:
directory("{sample}_split")
shell:
"""
mkdir {output}
split -l 3 {input.text_file} {output}/ ## / IS necessary
"""
# add extra file
rule blocking:
output:
blocking_input = "blocking_file.txt"
shell:
"echo 'blocking' > {output.blocking_input}"
# rename these unknown number of files
rule rename:
input:
splitted_file = "{sample}_split/{i}",blocking_input = "blocking_file.txt"
output:
renamed = "{sample}_renamed_{i}"
shell:
"""
sleep 2s
mv {input.splitted_file} {output.renamed}
"""
# merge them together into one file per sample:
def aggregate_input(wildcards):
checkpoint_output = checkpoints.split.get(**wildcards).output[0]
return expand("{{sample}}_renamed_{i}",i=glob_wildcards(os.path.join(checkpoint_output,'{i}')).i)
rule merge:
input:
aggregate_input
output:
merged = "{sample}_merged.txt"
shell:
"cat {input} > {output.merged}"
我不确定我是否正确使用通配符,以及函数aggregate_input 是否是最好的方法。我还想知道是否有可能避免在检查点中为输出创建目录。我尝试了 {sample}_split_{i}
格式,但无法运行。
非常感谢!
解决方法
您需要将 rule split
转换为 checkpoint split
才能使其工作。看看documentation:
checkpoint split:
input:
text_file = "{sample}.txt"
output:
splitted_file = dynamic("{sample}_cut_{part}")
params:
prefix = "{sample}_cut_"
shell:
"split -l 3 {input.text_file} {params.prefix}"
我也不确定动态是否可能不会被弃用。至少 changelog 中的这个条目使它看起来可能是这样。文档中没有一个动态示例。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。