如何解决蛇形 - 缺少所有规则的输入文件
我正在尝试创建一个管道,该管道将在 config.yml
中采用用户配置的目录(他们从 BaseSpace 下载了 .fastq.gz 文件的项目目录),以在序列文件上运行 fastqc。我已经有了按通道合并 fastqs 并在合并文件上运行 fastqc 的下游步骤。
但是,通配符给我带来了在原始 basespace 文件上运行 fastqc 的问题。以下是我尝试运行 snakemake 时的错误。
Missing input files for rule all:
qc/fastqc_premerge/DEX-13_S9_L001_ngc1838-10_L001_ds.9fd1f6dff0df47ab821125aab07be69b_r1_fastqc.zip
qc/fastqc_premerge/BOMB-3-2-19D_S8_L002_ngc1838-8_L002_ds.b81c308d62ba447b8caf074ffb27917e_r1_fastqc.zip
qc/fastqc_premerge/DEX-13_S9_L002_ngc1838-10_L002_ds.6369bc71fac44f00931eecb9b0a45d59_r1_fastqc.zip
任何建议将不胜感激。以下是重现此问题的最少代码。
import glob
configfile: "config.yaml"
wildcard_constraints:
bsdir = '\w+_L\d+_ds.\w+',lanenum = '\d+'
inputdirectory=config["directory"]
DIRECTORY,SAMPLES,LANENUMS = glob_wildcards(inputdirectory+"/{bsdir}/{sample}_L{lanenum}_R1_001.fastq.gz")
DIRECTORY,LANENUMS = glob_wildcards(inputdirectory+"/{bsdir}/{sample}_L{lanenum}_R2_001.fastq.gz")
##### target rules #####
rule all:
input:
#expand('qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip',sample=SAMPLES,bsdir=DIRECTORY,lanenum=LANENUMS)
expand('qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip',zip,lanenum=LANENUMS) ##Changed to this from commenters suggestion,however,snakemake still wont run
rule fastqc_premerge_r1:
input:
f"{config['directory']}/{{bsdir}}/{{sample}}_L{{lanenum}}_R1_001.fastq.gz"
output:
html="qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1.html",zip="qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc,you are free to choose an arbitrary filename
params: ""
log:
"logs/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1.log"
threads: 1
wrapper:
"v0.69.0/bio/fastqc"
目录结构:
ngc1838-10_L001_ds.9fd1f6dff0df47ab821125aab07be69b/DEX-13_S9_L001_R1_001.fastq.gz
ngc1838-10_L001_ds.9fd1f6dff0df47ab821125aab07be69b/DEX-13_S9_L001_R2_001.fastq.gz
ngc1838-10_L002_ds.6369bc71fac44f00931eecb9b0a45d59/DEX-13_S9_L002_R1_001.fastq.gz
ngc1838-10_L002_ds.6369bc71fac44f00931eecb9b0a45d59/DEX-13_S9_L002_R2_001.fastq.gz
ngc1838-8_L002_ds.b81c308d62ba447b8caf074ffb27917e/BOMB-3-2-19D_S8_L002_R1_001.fastq.gz
ngc1838-8_L002_ds.b81c308d62ba447b8caf074ffb27917e/BOMB-3-2-19D_S8_L002_R2_001.fastq.gz
在上面的例子中,我想在所有 6 个输入 R1/R2 文件上运行 fastqc,然后在下游,为 DEX_13_S9(用于合并两个输入)和 BOMB-3_2_19D(这将是一个副本)创建一个合并文件1 个输入)。然后针对这些生成的 R1 和 R2 文件创建 4 个 fastqc 报告。
编辑:我必须更改以下内容才能运行snakemake
inputdirectory=config["directory"]
PROJECTDIR,RANDOMINT,LANENUM1,BSSTRINGS,LANENUMS = glob_wildcards(inputdirectory+"/{proj}-{randint}_L{lanenum1}_ds.{bsstring}/{sample}_L{lanenum}_R1_001.fastq.gz",followlinks=True)
PROJECTDIR,LANENUMS = glob_wildcards(inputdirectory+"/{proj}-{randint}_L{lanenum1}_ds.{bsstring}/{sample}_L{lanenum}_R2_001.fastq.gz",followlinks=True)
##### target rules #####
rule all:
input:
"qc/multiqc_report_premerge.html"
rule fastqc_premerge_r1:
input:
f"{config['directory']}/{{proj}}-{{randint}}_L{{lanenum1}}_ds.{{bsstring}}/{{sample}}_L{{lanenum}}_R1_001.fastq.gz"
output:
html="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1.html",zip="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc
params: ""
log:
"logs/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1.log"
threads: 1
wrapper:
"v0.69.0/bio/fastqc"
rule fastqc_premerge_r2:
input:
f"{config['directory']}/{{proj}}-{{randint}}_L{{lanenum1}}_ds.{{bsstring}}/{{sample}}_L{{lanenum}}_R2_001.fastq.gz"
output:
html="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2.html",zip="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc
params: ""
log:
"logs/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2.log"
threads: 1
wrapper:
"v0.69.0/bio/fastqc"
rule multiqc_pre:
input:
expand("qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1_fastqc.zip",lanenum=LANENUMS,proj=PROJECTDIR,randint=RANDOMINT,lanenum1=LANENUM1,bsstring=BSSTRINGS),expand("qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2_fastqc.zip",bsstring=BSSTRINGS)
output:
"qc/multiqc_report_premerge.html"
log:
"logs/multiqc_premerge.log"
wrapper:
"0.62.0/bio/multiqc"
解决方法
它会告诉您缺少哪些文件;这就是“缺少所有规则的输入文件”下的行。
话虽如此,为了回答您的原始问题,如果您进行试运行,那应该会告诉您要在您的文件中运行的每个计划规则的输入/输出文件是什么(使用标志 -n -r
)运行命令。
在您的 FrameObserverView
中,您有:
rule all
这应该生成 SAMPLES、DIRECTORY 和 LANENUMS 的所有组合。这是你想要的吗?我怀疑不是,因为这意味着所有样本都在所有目录中并且它们在所有通道上。也许您希望 expand('qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip',sample=SAMPLES,bsdir=DIRECTORY,lanenum=LANENUMS)
函数扩展列表:
zip
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。