微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

从python中的fasta文件之间找到主题

如何解决从python中的fasta文件之间找到主题

有人可以帮我处理这个 python 代码吗?当我运行它时,什么也没有发生。对我来说没有错误或任何奇怪的东西。它读入并打开文件就好了。 我有一组 Fasta 格式的蛋白质序列,我必须找到我的序列的基序 像“RRTxSKxxxxAxxRxG”我必须找到一个写x的序列

这是我的python代码

import re
    userinput = input("Please provide a FASTA file.")
    while userinput:
    try:
        if userinput == "0":
            break
        with open(userinput,mode = 'r') as protein:
            readprotein = protein.read()
        matches = re.findall('RTxSKxxxxAxxRxG',readprotein)
        for match in matches:
            print(match)
        break
    except FileNotFoundError:
        print("File not found. enter the fasta file.")
        userinput = input("Please provide a FASTA file. 0 to quit.")

解决方法

我的输入为 fasta.fasta:

>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG','RTGSKWFGRAGGRXG','RTGSKPPPPAGGRXG']
['RTXSKXXXXAXXRXG']
['RTGSKXXXXAGGRXG']
[]
['RTGSKLLLLAGGRSG','RTGSKPPPPAGGRXG']

将您的代码修改为:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 12 14:48:00 2021

@author: Pietro


https://stackoverflow.com/questions/67948483/find-motif-from-in-between-fasta-file-from-python


"""

import re


# userinput = input("Please provide a FASTA file.")

userinput = 'fasta.fasta'


pattern = re.compile(r"(RT[A-Z]SK[A-Z]{4}A[A-Z]{2}R[A-Z]G)")

matchz = []
while userinput:
    try:
        if userinput == "0":
            break
        with open(userinput,mode = 'r') as protein:
            for line in protein:  #memory efficient way
            #readprotein = protein.readlines()
            #for line in readprotein:
                # print(line)
                line = line.upper().strip("\n")
                if line.startswith('>'):
                    name=line
                else:
                    matches = re.findall(pattern,line)
                    print(name,matches)
                    matchz.append(matches)
        for match in matchz:
            print(match)
        break
    except FileNotFoundError:
        print("File not found. enter the fasta file.")
        userinput = input("Please provide a FASTA file. 0 to quit.")

输出为:

>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG','RTGSKPPPPAGGRXG']

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。