如何解决如何将序列代码字符串作为字典返回,它告诉我氨基酸发生的频率?
基本上,我正在尝试编写一个程序,该程序运行一个字符串,并返回所有可能的氨基酸以及它们出现的频率。我制作了这个程序,它给了我名字而不是数字。有人可以帮我吗?
DNA_Codons = {
# U
'UUU': 'Phenylalanin','UCU': 'Serin','UAU': 'Tyrosin','UGU': 'Cystein',# UxU
'UUC': 'Phenylalanin','UCC': 'Serin','UAC': 'Tyrosin','UGC': 'Cystein',# UxC
'UUA': 'Leucin','UCA': 'Serin','UAA': '---','UGA': '---',# UxA
'UUG': 'Leucin','UCG': 'Serin','UAG': '---','UGG': 'Tryptophan',# UxG
# C
'CUU': 'Leucin','CCU': 'Prolin','CAU': 'Histidin','CGU': 'Arginin',# CxU
'CUC': 'Leucin','CCC': 'Prolin','CAC': 'Histidin','CGC': 'Arginin',# CxC
'CUA': 'Leucin','CCA': 'Prolin','CAA': 'Glutamin','CGA': 'Arginin',# CxA
'CUG': 'Leucin','CCG': 'Prolin','CAG': 'Glutamin','CGG': 'Arginin',# CxG
# A
'AUU': 'Isoleucin','ACU': 'Threonin','AAU': 'Asparagin','AGU': 'Serin',# AxU
'AUC': 'Isoleucin','ACC': 'Threonin','AAC': 'Asparagin','AGC': 'Serin',# AxC
'AUA': 'Isoleucin','ACA': 'Threonin','AAA': 'Lysin','AGA': 'Arginin',# AxA
'AUG': 'Met','ACG': 'Threonin','AAG': 'Lysin','AGG': 'Arginin',# AxG
# G
'GUU': 'Valin','GCU': 'Alanin','GAU': 'Asparaginsäure','GGU': 'Glycin',# GxU
'GUC': 'Valin','GCC': 'Alanin','GAC': 'Asparaginsäure','GGC': 'Glycin',# GxC
'GUA': 'Valin','GCA': 'Alanin','GAA': 'Glutaminsäure','GGA': 'Glycin',# GxA
'GUG': 'Valin','GCG': 'Alanin','GAG': 'Glutaminsäure','GGG': 'Glycin' # GxG }
def translate_code(seq,init_pos=0):
return {
DNA_Codons[seq[pos:pos + 3]]
for pos in range(init_pos,len(seq) - 2,3)
}
print(translate_code("ACAAUUGACACAUAUCGUCGAGGGUGGCCA"))
我正在寻找的是这样的:
{'Threonin': 2,'Isoleucin': 1,'Asparaginsäure': 1,'Tyrosin': 1,'Arginin': 2,'Glycin': 1,'Tryptophan': 1,'Prolin': 1}
解决方法
import numpy as np
def count_amino_acids(seq,init_pos=0):
#First,create an amino acid dictionary from the codon dictionary:
count_dict = {}
# go from the initial position,to full length in steps of 3
for i in np.arange(init_pos,len(seq),3):
codon = seq[i:i+3] # get the codon
aa = DNA_Codons[codon] # look up the amino acid
if aa == '---': # stop at stop codons
return count_dict
count_dict[aa] += 1 # increment the counter
return count_dict
count_amino_acids("ACAAUUGACACAUAUCGUCGAGGGUGGCCA")
,
def translate_code(seq,init_pos=0):
final_codons = {}
for pos in range(init_pos,len(seq) - 2,3):
current_codon = DNA_Codons[seq[pos:pos + 3]]
if current_codon in final_codons:
final_codons[current_codon] += 1
else:
final_codons[current_codon] = 1
return final_codons
这应该完全按照您指定的方式工作。
,您可以使用集合模块中的 Counter 类:
要排除您不想要的组合,我建议根本不要将它们放入密码子字典中。我还倒置了你的字典,以减少重复次数并使其更易于维护。
设置:
acids = { 'Alanin': ['GCA','GCC','GCG','GCU'],'Arginin': ['AGA','AGG','CGA','CGC','CGG','CGU'],'Asparagin': ['AAC','AAU'],'Asparaginsäure':['GAC','GAU'],'Cystein': ['UGC','UGU'],'Glutamin': ['CAA','CAG'],'Glutaminsäure': ['GAA','GAG'],'Glycin': ['GGA','GGC','GGG','GGU'],'Histidin': ['CAC','CAU'],'Isoleucin': ['AUA','AUC','AUU'],'Leucin': ['CUA','CUC','CUG','CUU','UUA','UUG'],'Lysin': ['AAA','AAG'],'Met': ['AUG'],'Phenylalanin': ['UUC','UUU'],'Prolin': ['CCA','CCC','CCG','CCU'],'Serin': ['AGC','AGU','UCA','UCC','UCG','UCU'],'Threonin': ['ACA','ACC','ACG','ACU'],'Tryptophan': ['UGG'],'Tyrosin': ['UAC','UAU'],'Valin': ['GUA','GUC','GUG','GUU'] }
codons = { seq:name for name,sequences in acids.items() for seq in sequences }
计数:
from collections import Counter
def translate_code(seq,init_pos=0):
return Counter( codons[seq[pos:pos + 3]]
for pos in range(init_pos,3)
if seq[pos:pos + 3] in codons)
输出:
print(translate_code("ACAAUUGACACAUAUCGUCGAGGGUGGCCA"))
Counter({'Threonin': 2,'Arginin': 2,'Isoleucin': 1,'Asparaginsäure': 1,'Tyrosin': 1,'Glycin': 1,'Tryptophan': 1,'Prolin': 1})
注意,Counter 类实际上是一个字典。如果需要,您可以将其转换回普通字典。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。