微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Python:gff摘要文件生成脚本问题

如何解决Python:gff摘要文件生成脚本问题

我在Python编程方面经验很少,而且我才刚刚开始学习这种脚本语言。我有一个脚本和一堆.gff基因组文件。我正在使用此脚本来汇总我的gff文件中的信息。

#!/usr/bin/env python2

from Bio.SeqIO.FastaIO import SimpleFastaParser
from Bio.Seq import translate
from Bio.Seq import reverse_complement
import os
import string
import random
import sys

''' given all the gff files,summarise them to create a big CSV file with all the details of these genomes the details:
1. Number of CDSs
2. Number of pseudogene
3. Number of other elements in GFF files
4. Genome size After summary -> add ST,pathotype and Metadata and see how stratifying the data changes anything and will give a general description of the pan-genome'''


def read_gff(gff_file):     categories = ["hypothetical protein","transposase","pseudogene","conjuga","phage","fimbrial","plasmid","crispr","resistance","virulence","secretion system"]   counts = {}


for cat in categories:
    counts[cat] = 0
try:
    f = open(gff_file)
except IOError:
    print("Could not read file:",gff_file)
    return counts

for line in f:
    line = line.lower()
    if line.startswith("##fasta"):
        break
    if line.startswith("#"):
        continue
    toks = line.strip().split()
    product = toks[2]
    if product not in counts:
        counts[product] = 0
    counts[product] += 1
    for cat in categories:
        if cat in line:
            counts[cat] += 1

f.close()

return counts

header = ["cds","trna","hypothetical protein","secretion system"]

out = open("gff_summary.csv","w")
out.write("ID,file_name," + ",".join(header) + "\n")

cnt = 0
with open(sys.argv[1]) as f:
    for line in f:
        toks = line.strip().split("\t")
        if line.startswith("ID"):
            annot_loc_index = toks.index("Annotation_Location")
            continue
        ID = toks[0]
        files = toks[annot_loc_index].split(",")
        for f1 in files:
            print(f1)
            counts = read_gff(f1)
            out.write(ID + "," + f1)
            for cat in header:
                out.write("," + str(counts[cat]))
            out.write("\n")
        cnt += 1


out.close()

我已使用以下命令运行此脚本。我尝试过通配符和单打。但是我没有工作。

root@h:/home/fan/monas/script/gff_summaries# python summarise_gffs.py /home/fuan/monas/gff_combine/*.gff 

但是以下错误不断出现。

Traceback (most recent call last):
  File "summarise_gffs.py",line 71,in <module>
    files = toks[annot_loc_index].split(",")
NameError: name 'annot_loc_index' is not defined

请有人帮助我理解此错误。什么是托克斯,我该如何解决

谢谢

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?