当文件名是python中的变量时，如何更改文件名的一部分？

如何解决当文件名是python中的变量时，如何更改文件名的一部分？

我目前有一个 python 脚本，它将文件作为命令行参数，执行它需要执行的操作，然后输出附加了 _all_ORF.fsa_aa 的文件。我想实际编辑文件名而不是附加，但我对变量感到困惑。当文件是一个变量时，我不确定我实际上如何做到这一点。

以下是命令行参数的示例：

gL=genomeList.txt   #Text file containing a list of genomes to loop through.             

for i in $(cat ${gL}); do
    #some other stuff ; 
    python ./find_all_ORF_from_getorf.py ${i}_getorf.fsa_aa ; 
    done

这里是一些python脚本（find_all_ORF_from_getorf.py）：

import re,sys

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = sys.argv[1]

with open(f'{infile}_all_ORF.fsa_aa'.format(),"a") as file_object:
    for sequence in SeqIO.parse(infile,"fasta"):
       #do some stuff
       print(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}',file=file_object)

当前，输出文件名为 Genome_file_getorf.fsa_aa_all_ORF.fsa_aa。我想删除第一个 fsa_aa，以便输出如下所示：Genome_file_getorf_all_ORF.fsa_aa。我该怎么做呢？我不知道如何编辑它。

我查看了 os.rename module，但它似乎无法编辑变量名称，只需附加到它即可。

谢谢，

解决方法

关于您的 bash 代码，您可能会发现以下代码段很有用，我发现它更具可读性，并且在遍历行时我倾向于经常使用它。

while read line; do
    #some other stuff ; 
    python ./find_all_ORF_from_getorf.py ${line}_getorf.fsa_aa ; 
done < genomeList.txt

现在关于你的问题和你的 python 代码

import re,sys 

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = sys.argv[1]

此时您的 infile 将类似于“Genome_file_getorf.fsa_aa” 一种选择是通过 '.' 分割这个字符串。并获得第一项

name = infile.split('.')[0]

如果您知道可能有多个“.”在文件名中，例如“Myfile.out.old”，而您只想去掉最后一个扩展名

name = infile.rsplit('.',1)[0]

第三个选项，如果您知道所有文件都以 '.fsa_aa' 结尾，您可以使用负索引对字符串进行切片。因为 '.fsa_aa' 有 7 个字符：

name = input[:-7]

这三个选项基于python中字符串处理的字符串方法，详见official python docs

outfile = f'{name}_all_ORF.fsa_aa' 
# if you wrote f'{variable}' you don't need the ".format()"
# On the other hand you can do '{}'.format(variable)
# or even '{variable}'.format(variable=SomeOtherVariable)

with open(outfile,"a") as file_object:
    for sequence in SeqIO.parse(infile,"fasta"):
       #do some stuff
       file_object.write(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}')

另一种选择是使用 pathlib library 中的 Path 我建议你使用这个库。在这种情况下，您必须对代码进行一些其他小的更改：

import re,sys
from pathlib import Path # <- Here

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = Path(sys.argv[1]) # <- Here
outfile = infile.stem + '_all_ORF.fsa_aa' # <- Here 
# And if you want to use outfile as a path I would suggest instead
# outfile = infile.parent.joinpath(infile.stem)

with open(outfile,\n{sequence.seq[h_start:]}')

最后，正如您在这两种情况下所看到的，我用 file_object.write 方法替换了打印语句，最好的做法是写入文件而不是打印文件。

当文件名是python中的变量时，如何更改文件名的一部分？

如何解决当文件名是python中的变量时，如何更改文件名的一部分？

解决方法

相关推荐