使用模型训练自定义数据集之前,在用在网上搜索得到的图片制作数据集时,即使批量修改图片名称后,在使用labelimg标注得到的xml文件中,图片名称还是网络上图片原本的名称,这时需要对其进行批量修改。
<annotation>
<folder>测试图片</folder>
<filename>ae2f50b6a937df1e1a72f9bcc45b172d.jpg</filename>
<path>F:\项目图像数据集\ae2f50b6a937df1e1a72f9bcc45b172d.jpg</path>
<source>
<database>UnkNown</database>
</source>
<size>
<width>800</width>
<height>800</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>class1</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndBox>
<xmin>631</xmin>
<ymin>275</ymin>
<xmax>714</xmax>
<ymax>509</ymax>
</bndBox>
</object>
</annotation>
然后先修改路径,将xml文件对应图片的真实路径替换。这里图片的名称是采用12位数字排序的。
import xml.dom.minidom
import os
path = r'D:\test\xmltest\xml_source' # xml文件存放路径
sv_path = r'D:\test\xmltest\xml_save' # 修改后的xml文件存放路径
files = os.listdir(path)
cnt = 0
for xmlFile in files:
dom = xml.dom.minidom.parse(os.path.join(path,xmlFile)) # 打开xml文件,送到dom解析
root = dom.documentElement # 得到文档元素对象
item = root.getElementsByTagName('path') # 获取path这一node名字及相关属性值
for i in item:
i.firstChild.data = f'D:/test/xmltest/xml_source/' + str(cnt).zfill(12) + '.jpg' # xml文件对应的图片路径
with open(os.path.join(sv_path,xmlFile),'w',encoding='utf-8') as fh:
dom.writexml(fh)
cnt += 1
修改后变成这样。
<?xml version="1.0" ?><annotation>
<folder>测试图片</folder>
<filename>ae2f50b6a937df1e1a72f9bcc45b172d.jpg</filename>
<path>D:/test/xmltest/JPEGimage/000000000000.jpg</path>
<source>
<database>UnkNown</database>
</source>
<size>
<width>800</width>
<height>800</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>class1</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndBox>
<xmin>631</xmin>
<ymin>275</ymin>
<xmax>714</xmax>
<ymax>509</ymax>
</bndBox>
</object>
</annotation>
import xml.dom.minidom
import os
path = r'D:\test\xmltest\xml_source' # xml文件存放路径
sv_path = r'D:\test\xmltest\xml_save' # 修改后的xml文件存放路径
files = os.listdir(path)
for xmlFile in files:
dom = xml.dom.minidom.parse(os.path.join(path,xmlFile)) # 打开xml文件,送到dom解析
root = dom.documentElement # 得到文档元素对象
names = root.getElementsByTagName('filename')
a,b = os.path.splitext(xmlFile) # 分离出文件名a
for n in names:
n.firstChild.data = a + '.jpg'
with open(os.path.join(sv_path,encoding='utf-8') as fh:
dom.writexml(fh)
<?xml version="1.0" ?><annotation>
<folder>测试图片</folder>
<filename>000000000000.jpg</filename>
<path>D:/test/xmltest/JPEGimage/000000000000.jpg</path>
<source>
<database>UnkNown</database>
</source>
<size>
<width>800</width>
<height>800</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>class1</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndBox>
<xmin>631</xmin>
<ymin>275</ymin>
<xmax>714</xmax>
<ymax>509</ymax>
</bndBox>
</object>
</annotation>
得到的xml文件会显示版本号,如果直接用xml文件训练可能会报错,所以还需删除<?xml version="1.0" ?>。如果需要删除或者替换其他属性,也可在此修改。
# -*- coding:utf-8 -*-
# 将a替换成b
import os
xmldir = r'D:\test\xmltest\xml_source'
savedir = r'D:\test\xmltest\xml_save'
xmllist = os.listdir(xmldir)
for xml in xmllist:
if '.xml' in xml:
fo = open(savedir + '/' + '{}'.format(xml),encoding='utf-8')
print('{}'.format(xml))
fi = open(xmldir + '/' + '{}'.format(xml),'r',encoding='utf-8')
content = fi.readlines()
for line in content:
# line = line.replace('a','b') # 例:将a替换为b
line = line.replace('<?xml version="1.0" ?>','')
line = line.replace('<folder>测试图片</folder>','<folder>车辆图片</folder>')
line = line.replace('<name>class1</name>','<name>class2</name>')
fo.write(line)
fo.close()
print('替换成功')
# 如通b为空字符串,就是删除
大功告成。
<annotation>
<folder>车辆图片</folder>
<filename>000000000000.jpg</filename>
<path>D:/test/xmltest/JPEGimage/000000000000.jpg</path>
<source>
<database>UnkNown</database>
</source>
<size>
<width>800</width>
<height>800</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>class2</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndBox>
<xmin>631</xmin>
<ymin>275</ymin>
<xmax>714</xmax>
<ymax>509</ymax>
</bndBox>
</object>
</annotation>
参考代码
python批量修改xml文件的属性(filename/path) - 代码先锋网python批量修改xml文件的属性(filename/path),代码先锋网,一个为软件开发程序员提供代码片段和技术文章聚合的网站。https://www.codeleading.com/article/67672212062/Python: 文件夹下xml内容批量替换、删除_南石北岸生的博客-CSDN博客_python替换xml 内容 功能:对文件夹下的所有xml进行批量替换或删除。#-*- coding:utf-8 -*-#将a替换成bimport osxmldir=''savedir=''xmllist=os.listdir(xmldir)for xml in xmllist: if '.xml' in xml: fo=open(savedir+'/'+'new_{}'.for... https://blog.csdn.net/gusui7202/article/details/85194806
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。