如何解决在 UTF-16 hex bin 文件中查找字符串列表并记录它们的偏移位置
让我先说我对代码非常缺乏经验。十年前我参加了一些课程,可以记住一些基本原则,但仅此而已。我没有熟悉或积极使用的语言。无论如何,进入我的问题。
我有一个我试图在一个大型 .bin 名称主文件中查找的选定名称列表,并为每个名称记录它们的偏移位置。每个名称也可能有多个匹配项,因此我需要它在新列中记录每个位置(假设某种表格输出)。
我可以使用 HxD 或 HexEditorNeo 等十六进制编辑器打开 .bin 文件,并在“解码文本”部分查看名称。该文件采用 UTF-16,因此 HexEditorNeo 允许我设置该编码以删除“.”。每个字符之间(不是实际的句点,而是它如何表示 00 个空字符)。
我可以使用查找工具搜索名称,并且可以查看和复制偏移量。但是,我有几千个名字,所以手工做起来很乏味。
Selected_Names.txt
John Williams
Howard Shore
Hans Zimmer
Master_Name_File.bin
47 00 61 00 6E 00 64 00 61 00 6C 00 66 00 00 00
48 00 6F 00 77 00 61 00 72 00 64 00 20 00 53 00
68 00 6F 00 72 00 65 00 00 00 44 00 61 00 72 00
6B 00 20 00 4B 00 6E 00 69 00 67 00 68 00 74 00
00 00 48 00 61 00 6E 00 73 00 20 00 5A 00 69 00
6D 00 6D 00 65 00 72 00 00 00 4C 00 75 00 6B 00
65 00 20 00 53 00 6B 00 79 00 77 00 61 00 6C 00
6B 00 65 00 72 00 00 00 4A 00 6F 00 68 00 6E 00
20 00 57 00 69 00 6C 00 6C 00 69 00 61 00 6D 00
73 00 00 00 48 00 6F 00 77 00 61 00 72 00 64 00
20 00 53 00 68 00 6F 00 72 00 65 00 00 00 48 00
61 00 6E 00 73 00 20 00 5A 00 69 00 6D 00 6D 00
65 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00
G.a.n.d.a.l.f...
H.o.w.a.r.d. .S.
h.o.r.e...D.a.r.
k. .K.n.i.g.h.t.
..H.a.n.s. .Z.i.
m.m.e.r...L.u.k.
e. .S.k.y.w.a.l.
k.e.r...J.o.h.n.
.W.i.l.l.i.a.m.
s...H.o.w.a.r.d.
.S.h.o.r.e...H.
a.n.s. .Z.i.m.m.
e.r.............
期望输出
John Williams,00 00 00 78
Howard Shore,00 00 00 10,00 00 00 94
Hans Zimmer,00 00 00 42,00 00 00 AE
// get list of names to search for in array
nameArray = read file of selected names to search for // this is from a txt list
nameCount = length(nameArray)
nameCounter = 0
// get master name file
masterNameArray = read master file of names to search within // this is the hex file in UTF-16
masterNameCount = length(masterNameArray)
// loop through each name we're searching for
while nameCounter <= nameCount
// start the position over at 0 for each new name we are searching
offset = 0
match = 0
// loop through each position of the nameArray
while offset <= masterNameCount
if nameArray(nameCounter) == masterNameArray(nameCounter) // check if names match. THIS IS HEX,though,so a straight check can't be done. need to convert,as well as account for how much of the array to check (i.e. name length)
// record current offset position. record in new column for each match,since there may be multiple matches
masterNamePosition(nameCounter,match) = offset
match = match + 1
end if
offset = offset + 1
end while
nameCounter = nameCounter + 1
end while
write masterNamePosition to file
感谢任何愿意阅读本文并提供帮助的人!这对我来说意义重大!
解决方法
#!/usr/bin/python3
name_file = open('Master_Name_File.bin','rb').read()
names = open('Selected_Names.txt').read().splitlines()
def h(n):
s = '%08X' % n
return ' '.join([s[i:i + 2] for i in range(0,len(s),2)])
for n in names:
d = n.encode('utf-16le')
indexes = []
i = 0
while i >= 0:
i = name_file.find(d,i)
if i >= 0:
indexes += [i]
i += 1
if indexes:
print(f'{n},{",".join([h(i) for i in indexes])}')
输出:
John Williams,00 00 00 78
Howard Shore,00 00 00 10,00 00 00 94
Hans Zimmer,00 00 00 42,00 00 00 AE
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。