如何解决如何避免 Snakemake 的“扩展”功能中的“缺少输入文件”错误
当我运行以下 snakemake 代码时,我得到一个 MissingInputException
:
import re
import os
glob_vars = glob_wildcards(os.path.join(os.getcwd(),"inputs","{fileName}.{ext}"))
rule end:
input:
expand(os.path.join(os.getcwd(),"{fileName}_rename.fas"),fileName=glob_vars.fileName)
rule rename:
'''
rename fasta file to avoid problems
'''
input:
expand("inputs/{{fileName}}.{ext}",ext=glob_vars.ext)
output:
os.path.join(os.getcwd(),"{fileName}_rename.fas")
run:
list_ = []
with open(str(input)) as f2:
line = f2.readline()
while line:
while not line.startswith('>') and line:
line = f2.readline()
fas_name = re.sub(r"\W","_",line.strip())
list_.append(fas_name)
fas_seq = ""
line = f2.readline()
while not line.startswith('>') and line:
fas_seq += re.sub(r"\s","",line)
line = f2.readline()
list_.append(fas_seq)
with open(str(output),"w") as f:
f.write("\n".join(list_))
我的 Inputs
文件夹包含以下文件:
G.bullatarudis.fasta
goldfish_protein.faa
guppy_protein.faa
gyrodactylus_salaris.fasta
protopolystoma_xenopodis.fa
salmon_protein.faa
schistosoma_mansoni.fa
错误信息是:
Building DAG of jobs...
MissingInputException in line 10 of /home/zhangdong/works/NCBI/BLAST/RHB/test.rule:
Missing input files for rule rename:
inputs/guppy_protein.fasta
inputs/guppy_protein.fa
我假设错误是由 expand
函数引起的,因为只有 guppy_protein.faa
文件存在,但 expand
还会生成 guppy_protein.fasta
和 guppy_protein.fa
文件。有什么解决办法吗?
解决方法
默认情况下,expand
将生成输入列表的所有组合,因此这是预期行为。您需要输入来查找给定文件名的正确扩展名。我还没有测试过这个:
glob_vars = glob_wildcards(os.path.join(os.getcwd(),"inputs","{fileName}.{ext}"))
# create a dict to lookup extensions given fileNames
glob_vars_dict = {fname: ex for fname,ex in zip(glob_vars.fileName,glob_vars.ext)}
def rename_input(wildcards):
ext = glob_vars_dict[wildcards.fileName]
return f"inputs/{wildcards.fileName}.{ext}"
rule rename:
input: rename_input
一些不请自来的风格评论:
- 您不必在
glob_wildcards
前面加上os.getcwd
,glob_wildcards("inputs","{fileName}.{ext}"))
应该可以工作,因为默认情况下,snakemake 使用相对于工作目录的路径。 - 尝试在 python 中为变量名称坚持使用 snake_case 而不是 camalCase
- 在这种情况下,
fileName
不是您正在捕获的内容的良好描述。也许species_name
或species
会更清楚
感谢 Troy Comi,我修改了我的代码并且成功了:
import re
import os
import itertools
speciess,exts = glob_wildcards(os.path.join(os.getcwd(),"inputs_test","{species}.{ext}"))
rule end:
input:
expand("inputs_test/{species}_rename.fas",species=speciess)
def required_files(wildcards):
list_combination = itertools.product([wildcards.species],list(set(exts)))
exist_file = ""
for file in list_combination:
if os.path.exists(f"inputs_test/{'.'.join(file)}"):
exist_file = f"inputs_test/{'.'.join(file)}"
return exist_file
rule rename:
'''
rename fasta file to avoid problems
'''
input:
required_files
output:
"inputs_test/{species}_rename.fas"
run:
list_ = []
with open(str(input)) as f2:
line = f2.readline()
while line:
while not line.startswith('>') and line:
line = f2.readline()
fas_name = ">" + re.sub(r"\W","_",line.replace(">","").strip())
list_.append(fas_name)
fas_seq = ""
line = f2.readline()
while not line.startswith('>') and line:
fas_seq += re.sub(r"\s","",line)
line = f2.readline()
list_.append(fas_seq)
with open(str(output),"w") as f:
f.write("\n".join(list_))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。