微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

蛋白质到 RNA 密码子

如何解决蛋白质到 RNA 密码子

我有这个问题,我们需要编写一个代码,它需要一个蛋白质 fasta 文件和蛋白质序列标识符,并计算 fasta 文件中序列的所有可能的 RNA 组合,条件是组合总数应该小于 5000。

我首先制作了一个 RNA 密码子字典,然后我制作了一个函数,将 fasta 文件的元素(氨基酸)放入一个列表中,然后我尝试从该列表中进行组合,但是我得到一个错误,我尝试过但不知道问题出在哪里,如果有人可以检查代码并告诉我出了什么问题,我将不胜感激。

import itertools

from Bio import SeqIO

RNA_codon_table = {
'A': ('GCU','GCC','GCA','GCG'),'C': ('UGU','UGC'),'D': ('GAU','GAC'),'E': ('GAA','GAG'),'F': ('UUU','UUC'),'G': ('GGU','GGC','GGA','GGG'),'H': ('CAU','CAC'),'I': ('AUU','AUC','AUA'),'K': ('AAA','AAG'),'L': ('UUA','UUG','CUU','CUC','cua','CUG'),'M': ('AUG',),'N': ('AAU','AAC'),'P': ('CCU','CCC','CCA','CCG'),'Q': ('CAA','CAG'),'R': ('CGU','CGC','CGA','CGG','AGA','AGG'),'S': ('UCU','UCC','UCA','UCG','AGU','AGC'),'T': ('ACU','ACC','ACA','ACG'),'V': ('GUU','GUC','GUA','GUG'),'W': ('UGG','Y': ('UAU','UAC'),}
 
def protein_fasta (protein_file):
   protein_sequence = []
   protein = SeqIO.parse(protein_file,format = 'fasta')
   for Seqrecord in protein: 
      protein_sequence.append(Seqrecord.seq)
   print (protein_sequence)


for seq in protein_sequence:
     codons = [ list(RNA_codon_table[key]) for key in protein_sequence ]
print(list(itertools.product(codons)))

对不起,我不知道如何附加 fasta 文件,但这是里面的序列:

>seq_compl complete sequence
IEEATHMTPCYELHglrWVQIQDYAINVMQCL

这是我得到的错误

 KeyError                                  Traceback (most recent call last)
<ipython-input-65-3dd46947c505> in <module>
----> 1 all_combinations ('short_protein.fasta')

<ipython-input-64-45a50fffc1d9> in all_combinations(protein_file)
      5        protein_sequence.append(Seqrecord.seq)
      6 
----> 7    codons = [ list(RNA_codon_table[key]) for key in protein_sequence 
]
      8    print(list(itertools.product(codons)))

<ipython-input-64-45a50fffc1d9> in <listcomp>(.0)
      5        protein_sequence.append(Seqrecord.seq)
      6 
----> 7    codons = [ list(RNA_codon_table[key]) for key in protein_sequence 
 ]
      8    print(list(itertools.product(codons)))

 KeyError: Seq('IEEATHMTPCYELHglrWVQIQDYAINVMQCL')

解决方法

根据您的示例,protein_sequence 变量当前在 protein_fasta 函数的局部范围内声明。

您需要先将此函数的结果分配给一个变量,然后才能对其进行迭代。

例如,将您的印刷品切换为退货:

def protein_fasta (protein_file):
   protein_sequence = []
   protein = SeqIO.parse(protein_file,format = 'fasta')
   for Seqrecord in protein: 
      protein_sequence.append(Seqrecord.seq)
   return protein_sequence

并确保调用并分配函数的结果:

protein_sequence = protein_fasta(protein_file)

现在你有了可以迭代的东西。

我发现你的 for 循环还有一个问题。你没有对 seq 做任何事情。在这种情况下,可能应该将 protein_sequence 换成 seq。我还取出了包装 RNA_codon_table 的列表,因为我认为在这种情况下不需要它:

for seq in protein_sequence:
    codons = [ RNA_codon_table[key] for key in seq ]
    print(list(itertools.product(*codons)))
,

您的蛋白质串将产生数十亿种组合:

from itertools import product,islice
def protGen(proteins):
   for codons in product(*(RNA_codon_table[P] for P in proteins)):
       yield "".join(codons) 

计数组合:

proteins = "IEEATHMTPCYELHGLRWVQIQDYAINVMQCL"
    
count = 1
for P in proteins: count *= len(RNA_codon_table[P])

print(count) # 37,572,373,905,408 combinations

输出:

for protSeq in islice(protGen(proteins),500): # first 500
    print(protSeq)

AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUUUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUUUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUU
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUC
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCUUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCUUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUU
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUC
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUUUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUUUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUCUU
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUCUC
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUCUA

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。