如何解决处理 FASTA 编码时如何修复“AttributeError: 'Seq' object has no attribute 'tostring'”?
import pandas as pd
import numpy as np
from Bio import*
from Bio import SeqIO
import time
import h5py
def vectorizeSequence(seq):
# the order of the letters is not arbitrary.
# Flip the matrix up-down and left-right for reverse compliment
ltrdict = {'a':[1,0],'c':[0,1,'g':[0,'t':[0,1],'n':[0,0]}
return np.array([ltrdict[x] for x in seq])
starttime = time.time()
fasta_sequences = SeqIO.parse(open("contigs.fasta"),'fasta')
#fasta_sequences = str(seq.seq)
#GC(fasta_sequences)
with h5py.File('genomeEncoded.h5','w') as hf:
for fasta in fasta_sequences:
# get the fasta files.
name,sequence = fasta.id,fasta.seq.tostring() # HERE APPEARS ERROR
# Write the chromosome name
new_file.write(name)
# encoding scheme
data = vectorizeSequence(sequence.lower())
print (name + " is one hot encoded!")
# write to hdf5
hf.create_dataset(name,data=data)
print (name + " is written to dataset")
endtime = time.time()
print ("Encoding is done in " + str(endtime))
回溯(最近一次调用最后一次): 文件“FASTA_ENCODING4ML.py”,第 30 行,在 名称,序列 = fasta.id,fasta.seq.tostring() AttributeError: 'Seq' 对象没有属性 'tostring'
解决方法
要将 Biopython Seq
对象转换为字符串,请使用 str()
。
例如:
str(Seq('ATCGTGC'))
>>>>'ATCGTGC'
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。