如何解决在多个样本上循环执行蛇形制作规则
我已经开始使用snakemake来构建工作流,以在样本列表上运行bwa mem。以下是我的配置文件结构和实际数据。
sample1:
- argument: 0
log: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/6e8c3cdf-1cad-4426-a005-91d3c4d5f691/sample1/sample1_fcl1_lane1_aln.log
name: fcl1_lane1
output: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/6e8c3cdf-1cad-4426-a005-91d3c4d5f691/sample1/sample1_fcl1_lane1_aln.sam
params: '-T 1 -M -R ''@RG\tID:POP1_L3\tPL:Illumina\tPU:D0AW3ACXX.3\tLB:POP1.TruSeq\tSM:POP1'' '
read1: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/POP1_R1.fastq
read2: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/POP1_R2.fastq
reference: /home/nsm/Desktop/NSM/nsm-backend/media/global_space/NCBI37_DECOY.fa
- argument: 0
log: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/6e8c3cdf-1cad-4426-a005-91d3c4d5f691/sample1/sample1_fcl1_lane2_aln.log
name: fcl1_lane2
output: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/6e8c3cdf-1cad-4426-a005-91d3c4d5f691/sample1/sample1_fcl1_lane2_aln.sam
params: '-T 1 -M -R ''@RG\tID:POP1_L3\tPL:Illumina\tPU:D0AW3ACXX.3\tLB:POP1.TruSeq\tSM:POP1'' '
read1: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/POP1_R1.fastq
read2: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/POP1_R2.fastq
reference: /home/nsm/Desktop/NSM/nsm-backend/media/global_space/NCBI37_DECOY.fa
- argument: 0
log: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/6e8c3cdf-1cad-4426-a005-91d3c4d5f691/sample1/sample1_fcl2_lane2_aln.log
name: fcl2_lane2
output: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/6e8c3cdf-1cad-4426-a005-91d3c4d5f691/sample1/sample1_fcl2_lane2_aln.sam
params: '-T 1 -M -R ''@RG\tID:POP1_L3\tPL:Illumina\tPU:D0AW3ACXX.3\tLB:POP1.TruSeq\tSM:POP1'' '
read1: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/POP1_R1.fastq
read2: /home/nsm/Desktop/NSM/nsm-backend/media/user_2_animesh_singh21/file/POP1_R2.fastq
reference: /home/nsm/Desktop/NSM/nsm-backend/media/global_space/NCBI37_DECOY.fa
sample1的每个条目都包含['reference','read1','read2','output','log','params','name','argument']
我写了以下蛇文件来运行bwa mem:
sample_name = list(config.keys())[0]
rule all:
message: "Generating looped output"
input:
config['output']
rule bwa_run:
message: "Running BWA MEM"
output: [i['output'] for i in config[sample_name]]
threads: 4
run:
for i in config[sample_name]:
shell(f"bwa mem {i['reference']} {i['read1']} {i['read2']} -t {threads} {i['params']} > {i['output']} 2> {i['log']}")
这可以很好地完成这项工作,但是我知道这不会并行运行。 for循环避免了这种情况。我需要帮助弄清楚如何将其转换为并行运行的snakefile。
我试图通过查看stackoverflow中的其他答案来完成以下操作。
wildcard_constraints:
counter = "\d+"
sample_name = list(config.keys())[0]
def input_all(wildcards):
return [
config[sample_name][int(wildcards.counter)]['reference'],config[sample_name][int(wildcards.counter)]['read1'],config[sample_name][int(wildcards.counter)]['read2'],]
rule all:
message: "Generating looped output"
input:
expand("{output}{counter}",counter = range(len(config[sample_name])),output = [i['output'] for i in config[sample_name]])
rule bwa_run:
message: "running BWA MEM"
input:
input_all
output: "{output}{counter}"
run:
shell("bwa mem {input} -t {threads} > {output}")
它确实可以工作,但是我需要同时使用{counter}
中的{output}
和rule all
,否则我将无法访问wildcards.counter
。对于sample1
中的每个条目,它还会循环3次,并且还会更改我的输出文件名。必须有更好的方法来做到这一点。
任何想法,将不胜感激。谢谢!!!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。