Snakemake：如何在规则中使用不同的参数整数运行 shell 命令？

如何解决Snakemake：如何在规则中使用不同的参数整数运行 shell 命令？

我正在尝试为我的增强决策树训练研究最佳超参数。这是两个实例的代码：

user = '/home/.../BDT/'

nestimators = [1,2]

rule all:
        input: user + 'AUC_score.pdf'

rule testing:
        output: user + 'AUC_score.csv'
        shell: 'python bdt.py --nestimators {}'.format(nestimators[i] for i in range(2))

rule plotting:
        input: user + 'AUC_score.csv'
        output: user + 'AUC_score.pdf'
        shell: 'python opti.py

计划如下：我想用一堆不同的超参数来并行化我的 BDT 的训练（一开始我只想从 nestimators 开始）。因此我尝试使用 shellcommand 来训练 bdt。 bdt.py 获取训练参数，训练并将超参数 + 训练分数保存在 csv 文件中。在 csv 文件中，我可以查看哪些超参数给出了最好的分数。耶！

遗憾的是它不能那样工作。我试图使用输入函数，但由于我想给出一个整数，它不起作用。我按照您在上面看到的方式进行了尝试，但知道我收到一条“错误消息”：“python bdt.py --nestimators ”。我明白为什么这也不起作用，但我不知道从哪里开始。

解决方法

出现错误是因为 {} 被一个生成器对象替换，也就是说，它不是先被 1 替换，然后被 2 替换，但是，可以这么说，通过 nestimators 上的迭代器。

即使您更正了规则 testing 中的 Python 表达式。如果我正确理解你的目标，可能会有一个更根本的问题。 The workflows of snakemake are defined in terms of rules that define how to create output files from input files. 因此，函数测试只会被调用一次，但您可能希望为每个超参数分别调用规则。

解决方案是在输出的文件名中添加超参数。像这样：

user = '/home/.../BDT/'

nestimators = [1,2]

rule all:
        input: user + 'AUC_score.pdf'

rule testing:
        output: user + 'AUC_score_{hyper}.csv'
        shell: 'python bdt.py --nestimators {wildcards.hyper}'

rule plotting:
        input: expand(user + 'AUC_score_{hyper}.csv',hyper=nestimators)
        output: user + 'AUC_score.pdf'
        shell: 'python opti.py'

最后，代替使用 shell: 来调用 python 脚本。您可以按照文档中的说明直接使用 script:： https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts

您代码中的问题是表达式 nestimators[i] for i in range(2) 不是列表（您可能会认为）。那是一个生成器，在您明确这样做之前它不会产生任何值。例如，这段代码：

'python bdt.py --nestimators {}'.format(list(nestimators[i] for i in range(2)))

产生结果'python bdt.py --nestimators [1,2]'

实际上您根本不需要生成器，因为此代码产生完全相同的输出：

'python bdt.py --nestimators {}'.format(nestimators)

这种格式可能不是您的脚本所期望的。例如，如果你想得到这样的命令行：python bdt.py --nestimators 1,2，你可以使用这个表达式：

'python bdt.py --nestimators {}'.format(",".join(map(str,nestimators)))

如果您可以使用 f 字符串，则可以简化最后一个表达式：

f'python bdt.py --nestimators {",nestimators))}'

Snakemake：如何在规则中使用不同的参数整数运行 shell 命令？

如何解决Snakemake：如何在规则中使用不同的参数整数运行 shell 命令？

解决方法

相关推荐