如何解决在python中生成关于子序列的序列
我尝试生成以下序列。
text = ACCCEBCE
target = 000000D0
生成不同字符的随机文本。在文本序列中,如果找到以下子序列,则目标为 D 或 E,否则目标为 0。
ABC --> D
BCD --> E
我写了下面的代码。如果我生成少量字符,效果很好。但是如果我使 timesteps = 1000 等,它不会给出任何输出。
import string
import random as rn
import numpy as np
def is_subseq(x,y):
it = iter(y)
return all(any(c == ch for c in it) for ch in x)
def count(a,b,m,n):
# If both first and second string
# is empty,or if second string
# is empty,return 1
if ((m == 0 and n == 0) or n == 0):
return 1
# If only first string is empty
# and second string is not empty,# return 0
if (m == 0):
return 0
# If last characters are same
# Recur for remaining strings by
# 1. considering last characters
# of both strings
# 2. ignoring last character
# of first string
if (a[m - 1] == b[n - 1]):
return (count(a,m - 1,n - 1) +
count(a,n))
else:
# If last characters are different,# ignore last char of first string
# and recur for remaining string
return count(a,n)
# create a sequence classification instance
def get_sequence(n_timesteps):
alphabet="ABCDE"#string.ascii_uppercase
text = ''.join(rn.choices(alphabet,k=n_timesteps))
print(text)
seq_length=3
subseqX = []
subseqY = []
for i in range(0,len(alphabet) - seq_length,1):
seq_in = alphabet[i:i + seq_length]
seq_out = alphabet[i + seq_length]
subseqX.append([char for char in seq_in])
subseqY.append(seq_out)
print(seq_in,"\t-->\t",seq_out)
y2 = []
match = 0
countlist=np.zeros(len(subseqX))
for i,val in enumerate(text):
found = False
counter = 0
for g,val2 in enumerate(subseqX):
listToStr = ''.join(map(str,subseqX[g]))
howmany = count(text[:i],listToStr,len(text[:i]),len(listToStr))
if is_subseq(listToStr,text[:i]):
if countlist[g] < howmany:
match = match + howmany
countlist[g] = howmany
temp = g
found = True
if found:
y2.append(subseqY[temp])
else:
y2.append(0)
print("counter:\t",counter)
print(text)
print(y2)
# define problem properties
n_timesteps = 100
get_sequence(n_timesteps)
可能是因为递归函数的深度。但我需要生成 1000 或 10000 个字符。 我该如何解决这个问题?有什么想法吗?
解决方法
我不确定我是否理解您正在尝试做的所有事情(那里有很多代码),但我相信该函数的这种简化形式应该可以工作。它维护了迄今为止看到的一组子序列。它仅通过在遇到下一个字母时添加下一个字母来扩展它们。这允许标记知道之前是否已经看到了直到当前字符的序列的前缀。
def flagSequence(S,letters="ABCDE",seqLen=3):
subSeqs = set()
result = "0"
for c in S[:-1]:
p = letters.index(c)
subSeqs.add(c)
if p>0:
subSeqs.update([s+c for s in subSeqs if s[-1]==letters[p-1]])
if p in range(seqLen-1,len(letters)-1) and letters[p-seqLen+1:p+1] in subSeqs:
result += letters[p+1]
else:
result += "0"
return result
输出:
text = "BDBACCBECEECAEAEDCAACBCCDDDBBDEEDABDBDE"
print(text)
print(flagSequence(text))
BDBACCBECEECAEAEDCAACBCCDDDBBDEEDABDBDE
000000000D00D0000ED00D0DDEEE00E00E00E0E
更多字母:
alphabet=string.ascii_uppercase
text = ''.join(rn.choices(alphabet,k=10000))
flags = flagSequence(text,alphabet)
print(text[:60])
print(flags[:60])
CHUJKAMWCAAIBXGIZFHALAWWFDDELXREMOQQVXFPNYJRQESRVEJKIAQILYSJ...
000000000000000000000M000000FM00FN00000G0OZK0RFTS0FKLJ0RJMZT...
具有更长的序列:
alphabet=string.ascii_uppercase
text = ''.join(rn.choices(alphabet,alphabet,seqLen=10)
print(text[200:260])
print(flags[200:260])
...PMZCDQXAOHVMTRLYCNCJABGGNZYAWIHJJCQKMMAENQFHNQTOQOPPGHVQZXZU...
...00N0000Y000WN000Z0O0K0000O0Z0X00KK00LNN00O000O00P0PQQ00WR0Y0...
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。