Snakemake 输入两个变量，输出一个变量

如何解决Snakemake 输入两个变量，输出一个变量

我想重命名和移动我的 Finalize GroupAggregate (cost=440073.16..443607.99 rows=6779 width=40) (actual time=25504.816..25562.865 rows=14142 loops=1) Output: id_article,sum(qte) Group Key: mouvstk.id_article -> Gather Merge (cost=440073.16..443319.89 rows=27116 width=40) (actual time=25504.799..25580.712 rows=63081 loops=1) Output: id_article,(PARTIAL sum(qte)) Workers Planned: 4 Workers Launched: 4 -> Sort (cost=439073.10..439090.05 rows=6779 width=40) (actual time=25446.155..25447.759 rows=12616 loops=5) Output: id_article,(PARTIAL sum(qte)) Sort Key: mouvstk.id_article Sort Method: quicksort Memory: 1434kB Worker 0: Sort Method: quicksort Memory: 1431kB Worker 1: Sort Method: quicksort Memory: 1428kB Worker 2: Sort Method: quicksort Memory: 1430kB Worker 3: Sort Method: quicksort Memory: 1430kB Worker 0: actual time=25433.322..25434.870 rows=12618 loops=1 Worker 1: actual time=25435.450..25437.032 rows=12599 loops=1 Worker 2: actual time=25427.157..25428.702 rows=12611 loops=1 Worker 3: actual time=25432.809..25434.284 rows=12599 loops=1 -> Partial HashAggregate (cost=438556.99..438641.73 rows=6779 width=40) (actual time=25432.515..25441.923 rows=12616 loops=5) Output: id_article,PARTIAL sum(qte) Group Key: mouvstk.id_article Worker 0: actual time=25417.656..25428.424 rows=12618 loops=1 Worker 1: actual time=25424.587..25432.008 rows=12599 loops=1 Worker 2: actual time=25416.391..25423.729 rows=12611 loops=1 Worker 3: actual time=25417.598..25428.208 rows=12599 loops=1 -> Parallel Seq Scan on public.mouvstk (cost=0.00..429549.32 rows=1801535 width=13) (actual time=454.411..24611.221 rows=1439376 loops=5) Output: code_origine,numero_caisse,numero_document,date,code_clifour,code_vendeur,code_affaire,code_magasin,numero_serie,libelle,puht,puhtnet,puttc,puttcnet,taux_remise,code_tva,taux_tva,code_devise,parite_devise,frais_approche,prht,nomenclature,type_vente,code_tarif,code_categorie_achat,numero_lot,date_peremption,pvttcstd,lib_tarif,id_ligne_document,id,id_article,qte,id_clifour Filter: (mouvstk.date >= '2018-06-09'::date) Rows Removed by Filter: 1791877 Worker 0: actual time=438.619..24600.391 rows=1428362 loops=1 Worker 1: actual time=445.653..24609.448 rows=1425821 loops=1 Worker 2: actual time=437.424..24600.521 rows=1430897 loops=1 Worker 3: actual time=438.652..24605.422 rows=1430127 loops=1 Planning Time: 0.356 ms Execution Time: 25624.787 ms 文件：

fastq.gz

到这些：

NAME-BOB_S1_L001_R1_001.fastq.gz
NAME-BOB_S1_L001_R2_001.fastq.gz
NAME-JOHN_S2_L001_R1_001.fastq.gz
NAME-JOHN_S2_L001_R2_001.fastq.gz

这是我的代码。我遇到的问题是第二个变量 S，我不知道如何在代码中指定它，因为我的输出文件名中不需要它。

NAME_BOB/reads/NAME_BOB.R1.fastq.gz
NAME_BOB/reads/NAME_BOB.R2.fastq.gz
NAME_JOHN/reads/NAME_JOHN.R1.fastq.gz
NAME_JOHN/reads/NAME_JOHN.R2.fastq.gz

解决方法

您的代码中有几个问题。首先，输出中的 {dir} 和输入中的 {dir} 是两个不同的变量。实际上输出中的{dir}是一个通配符，而输入中的{dir}是expand函数的一个参数（而且，你甚至忘记调用这个函数，那就是第二个问题）。

第三个问题是 shell 部分应该只包含一个命令。您可以尝试 mv {input.fastq1} {output.fastq1}; mv {input.fastq2} {output.fastq2}，但这不是惯用的解决方案。最好是创建一个生成单个文件的规则，让 Snakemake 完成其余的工作。

最后，S 的值完全依赖于 DIR 的值，所以它变成了 {dir} 的函数，并且可以通过输入中的 lambda 来解决：

workdir: "/path/to/workdir/"

DIR=["BOB","JOHN"]
dir2s = {"BOB": "S1","JOHN": "S2"}

rule all:
    input: 
        expand("NAME_{dir}/reads/NAME_{dir}.{r}.fastq.gz",dir=DIR,r=["R1","R2"])
        
rule rename:
    input:
        lambda wildcards:
            "fastq/NAME-{{dir}}_{s}_L001_{{r}}_001.fastq.gz".format(s=dir2s[wildcards.dir])
    output:
        "NAME_{dir}/reads/NAME_{dir}.{r}.fastq.gz",shell:
        """
        mv {input} {output}
        """

Snakemake 输入两个变量，输出一个变量

如何解决Snakemake 输入两个变量，输出一个变量

解决方法

相关推荐