微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Python 在 .wav 文件中查找特定声音的时间戳

如何解决Python 在 .wav 文件中查找特定声音的时间戳

我有一个 .wav 文件,我录制了自己的声音并讲了几分钟。假设我想找到我在音频中说“迈克”的确切时间。我研究了语音识别,并使用 Google Speech API 进行了一些测试,但我得到的时间戳并不准确。

作为替代,我录制了一个很短的 .wav 文件,我只是说“Mike”。我试图比较这两个 .wav 文件,并找到在较长的 .wav 文件中提到“Mike”的每个时间戳。我遇到了SleuthEye's惊人的answer

代码非常适合仅查找一个时间戳,但我不知道如何查找多个开始/结束时间:

import numpy as np
import sys
from scipy.io import wavfile
from scipy import signal

snippet = sys.argv[1]
source  = sys.argv[2]

# read the sample to look for
rate_snippet,snippet = wavfile.read(snippet);
snippet = np.array(snippet,dtype='float')

# read the source
rate,source = wavfile.read(source);
source = np.array(source,dtype='float')

# resample such that both signals are at the same sampling rate (if required)
if rate != rate_snippet:
  num = int(np.round(rate*len(snippet)/rate_snippet))
  snippet = signal.resample(snippet,num)

# compute the cross-correlation
z = signal.correlate(source,snippet);

peak = np.argmax(np.abs(z))
start = (peak-len(snippet)+1)/rate
end   = peak/rate

print("start {} end {}".format(start,end))

我是音频和信号相关编程的新手,希望得到任何建议。谢谢!

解决方法

你快到了。您可以使用 find_peaks。例如

import numpy as np
from scipy.io import wavfile
from scipy import signal
import matplotlib.pyplot as plt

snippet = 'snippet.wav'
source  = 'source.wav'

# read the sample to look for
rate_snippet,snippet = wavfile.read(snippet);
snippet = np.array(snippet[:,0],dtype='float')

# read the source
rate,source = wavfile.read(source);
source = np.array(source[:,dtype='float')

# resample such that both signals are at the same sampling rate (if required)
if rate != rate_snippet:
    num = int(np.round(rate*len(snippet)/rate_snippet))
    snippet = signal.resample(snippet,num)

我的来源和片段

x_snippet = np.arange(0,snippet.size) / rate_snippet

plt.plot(x_snippet,snippet)
plt.xlabel('seconds')
plt.title('snippet')

enter image description here

x_source = np.arange(0,source.size) / rate

plt.plot(x_source,source)
plt.xlabel('seconds')
plt.title('source')

enter image description here

现在我们得到相关性

# compute the cross-correlation
z = signal.correlate(source,snippet,mode='same')

我使用了 mode='same' 以便 sourcez 具有相同的长度

source.size == z.size
True

现在,我们可以定义一个最小峰高,例如

x_z = np.arange(0,z.size) / rate

plt.plot(x_z,z)
plt.axhline(2e20,color='r')
plt.title('correlation')

enter image description here

并在最小距离内找到峰值(您可能需要根据您的样本定义自己的 heightdistance

peaks = signal.find_peaks(
    z,height=2e20,distance=50000
)

peaks
(array([ 117390,225754,334405,449319,512001,593854,750686,873026,942586,1064083]),{'peak_heights': array([8.73666562e+20,9.32871542e+20,7.23883305e+20,9.30772354e+20,4.32924341e+20,9.18323020e+20,1.12473608e+21,1.07752019e+21,1.12455724e+21,1.05061734e+21])})

我们取峰值 idxs

peaks_idxs = peaks[0]

plt.plot(x_z,z)
plt.plot(x_z[peaks_idxs],z[peaks_idxs],'or')

enter image description here

因为它们“几乎”在代码片段的中间,我们可以做

fig,ax = plt.subplots(figsize=(12,5))
plt.plot(x_source,source)
plt.xlabel('seconds')
plt.title('source signal and correlatation')
for i,peak_idx in enumerate(peaks_idxs):
    start = (peak_idx-snippet.size/2) / rate
    center = (peak_idx) / rate
    end   = (peak_idx+snippet.size/2) / rate
    plt.axvline(start,color='g')
    plt.axvline(center,color='y')
    plt.axvline(end,color='r')
    print(f"peak {i}: start {start:.2f} end {end:.2f}")

peak 0: start 2.34 end 2.98
peak 1: start 4.80 end 5.44
peak 2: start 7.27 end 7.90
peak 3: start 9.87 end 10.51
peak 4: start 11.29 end 11.93
peak 5: start 13.15 end 13.78
peak 6: start 16.71 end 17.34
peak 7: start 19.48 end 20.11
peak 8: start 21.06 end 21.69
peak 9: start 23.81 end 24.45

enter image description here

但也许有更好的方法来更精确地定义开始和结束。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。