微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在 UTF-16 hex bin 文件中查找字符串列表并记录它们的偏移位置

如何解决在 UTF-16 hex bin 文件中查找字符串列表并记录它们的偏移位置

让我先说我对代码非常缺乏经验。十年前我参加了一些课程,可以记住一些基本原则,但仅此而已。我没有熟悉或积极使用的语言。无论如何,进入我的问题。

我有一个我试图在一个大型 .bin 名称文件中查找的选定名称列表,并为每个名称记录它们的偏移位置。每个名称也可能有多个匹配项,因此我需要它在新列中记录每个位置(假设某种表格输出)。

我可以使用 HxD 或 HexEditorNeo 等十六进制编辑器打开 .bin 文件,并在“解码文本”部分查看名称。该文件采用 UTF-16,因此 HexEditorNeo 允许我设置该编码以删除“.”。每个字符之间(不是实际的句点,而是它如何表示 00 个空字符)。

我可以使用查找工具搜索名称,并且可以查看和复制偏移量。但是,我有几千个名字,所以手工做起来很乏味。

这是我将拥有的输入文件和所需输出的示例:

Selected_Names.txt

John Williams
Howard Shore
Hans Zimmer

Master_Name_File.bin

47 00 61 00 6E 00 64 00 61 00 6C 00 66 00 00 00
48 00 6F 00 77 00 61 00 72 00 64 00 20 00 53 00 
68 00 6F 00 72 00 65 00 00 00 44 00 61 00 72 00 
6B 00 20 00 4B 00 6E 00 69 00 67 00 68 00 74 00 
00 00 48 00 61 00 6E 00 73 00 20 00 5A 00 69 00 
6D 00 6D 00 65 00 72 00 00 00 4C 00 75 00 6B 00 
65 00 20 00 53 00 6B 00 79 00 77 00 61 00 6C 00 
6B 00 65 00 72 00 00 00 4A 00 6F 00 68 00 6E 00 
20 00 57 00 69 00 6C 00 6C 00 69 00 61 00 6D 00 
73 00 00 00 48 00 6F 00 77 00 61 00 72 00 64 00 
20 00 53 00 68 00 6F 00 72 00 65 00 00 00 48 00 
61 00 6E 00 73 00 20 00 5A 00 69 00 6D 00 6D 00 
65 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00

G.a.n.d.a.l.f...
H.o.w.a.r.d. .S.
h.o.r.e...D.a.r.
k. .K.n.i.g.h.t.
..H.a.n.s. .Z.i.
m.m.e.r...L.u.k.
e. .S.k.y.w.a.l.
k.e.r...J.o.h.n.
 .W.i.l.l.i.a.m.
s...H.o.w.a.r.d.
 .S.h.o.r.e...H.
a.n.s. .Z.i.m.m.
e.r.............

期望输出

John Williams,00 00 00 78
Howard Shore,00 00 00 10,00 00 00 94
Hans Zimmer,00 00 00 42,00 00 00 AE

我试图在代码中思考这可能是什么样子,并想出了以下伪代码

// get list of names to search for in array
nameArray = read file of selected names to search for // this is from a txt list
nameCount = length(nameArray)
nameCounter = 0

// get master name file
masterNameArray = read master file of names to search within  // this is the hex file in UTF-16
masterNameCount = length(masterNameArray)

// loop through each name we're searching for
while nameCounter <= nameCount

     // start the position over at 0 for each new name we are searching
     offset = 0
     match = 0

     // loop through each position of the nameArray
     while offset <= masterNameCount

          if nameArray(nameCounter) == masterNameArray(nameCounter)  // check if names match. THIS IS HEX,though,so a straight check can't be done. need to convert,as well as account for how much of the array to check (i.e. name length)

               // record current offset position. record in new column for each match,since there may be multiple matches
               masterNamePosition(nameCounter,match) = offset
               match = match + 1
          end if

          offset = offset + 1
     end while

     nameCounter = nameCounter + 1

end while

write masterNamePosition to file

感谢任何愿意阅读本文并提供帮助的人!这对我来说意义重大!

解决方法

#!/usr/bin/python3

name_file = open('Master_Name_File.bin','rb').read()
names = open('Selected_Names.txt').read().splitlines()

def h(n):
  s = '%08X' % n
  return ' '.join([s[i:i + 2] for i in range(0,len(s),2)])
  
  
for n in names:
  d = n.encode('utf-16le')
  indexes = []
  i = 0
  while i >= 0:
    i = name_file.find(d,i)
    if i >= 0:
      indexes += [i]
      i += 1
  if indexes:
    print(f'{n},{",".join([h(i) for i in indexes])}')

输出:

John Williams,00 00 00 78
Howard Shore,00 00 00 10,00 00 00 94
Hans Zimmer,00 00 00 42,00 00 00 AE

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。