如何解决从python中的fasta文件之间找到主题
有人可以帮我处理这个 python 代码吗?当我运行它时,什么也没有发生。对我来说没有错误或任何奇怪的东西。它读入并打开文件就好了。 我有一组 Fasta 格式的蛋白质序列,我必须找到我的序列的基序 像“RRTxSKxxxxAxxRxG”我必须找到一个写x的序列
这是我的python代码
import re
userinput = input("Please provide a FASTA file.")
while userinput:
try:
if userinput == "0":
break
with open(userinput,mode = 'r') as protein:
readprotein = protein.read()
matches = re.findall('RTxSKxxxxAxxRxG',readprotein)
for match in matches:
print(match)
break
except FileNotFoundError:
print("File not found. enter the fasta file.")
userinput = input("Please provide a FASTA file. 0 to quit.")
解决方法
我的输入为 fasta.fasta:
>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG','RTGSKWFGRAGGRXG','RTGSKPPPPAGGRXG']
['RTXSKXXXXAXXRXG']
['RTGSKXXXXAGGRXG']
[]
['RTGSKLLLLAGGRSG','RTGSKPPPPAGGRXG']
将您的代码修改为:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 12 14:48:00 2021
@author: Pietro
https://stackoverflow.com/questions/67948483/find-motif-from-in-between-fasta-file-from-python
"""
import re
# userinput = input("Please provide a FASTA file.")
userinput = 'fasta.fasta'
pattern = re.compile(r"(RT[A-Z]SK[A-Z]{4}A[A-Z]{2}R[A-Z]G)")
matchz = []
while userinput:
try:
if userinput == "0":
break
with open(userinput,mode = 'r') as protein:
for line in protein: #memory efficient way
#readprotein = protein.readlines()
#for line in readprotein:
# print(line)
line = line.upper().strip("\n")
if line.startswith('>'):
name=line
else:
matches = re.findall(pattern,line)
print(name,matches)
matchz.append(matches)
for match in matchz:
print(match)
break
except FileNotFoundError:
print("File not found. enter the fasta file.")
userinput = input("Please provide a FASTA file. 0 to quit.")
输出为:
>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG','RTGSKPPPPAGGRXG']
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。