如何使用 DTW + kNN - 编程之家

如何解决如何使用 DTW + kNN

我一直试图通过从音频中提取 MFCC（使用 librosa 库）来区分单词，然后应用动态时间扭曲使用 kNN 在音频之间进行分类。

例如，我试图在“cat”和“anything”这两个词之间进行识别。我的问题是我找不到“anything”这个词和“cat”这个词的两种不同发音之间的相似之处。根据 DTW，这三个词之间的距离似乎相等。我试图减少或增加 MFCC 中使用的系数数量，以对 MFCC 进行预处理（标准化和去除均值），但似乎没有任何效果。

我正在使用 dtw 包中的 DTW 函数：dist,cost,acc_cost,path = dtw(mfcc3.T,mfcc2.T,dist=lambda x,y: norm(x - y,ord=1))

我的问题是：为什么你认为我不能对这些数据进行分类？

——在与 DTW 比较之前，我是否对数据进行了不充分的预处理？

——我需要更智能地调整 DTW 以便有效地区分不同单词的距离吗？

——在我的情况下，kNN 或 DTW 是否不够用？我该如何解决这个问题？

以下是代码的主要行：

for i in range(len(mots)):
y1,sr1 = librosa.load(dirname+"/"+mots[i])
mfcc1 = librosa.feature.mfcc(y1,sr1)
for j in range(len(mots)):
    y2,sr2 = librosa.load(dirname+"/"+mots[j])
    mfcc2 = librosa.feature.mfcc(y2,sr2)
    dist,_,_ = dtw(mfcc1.T,ord=1))
    distances[i,j] = dist #representing the distance between the spoken words i and j


label = ['cat','anything']

# # Train a kNN classifier to determine if the audio is cat or anything

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=3,metric='euclidean')
classifier.fit(distances,y)


# # Comparing a sample with these distances to find which word is the most similar to the sample
y,sr = librosa.load(dst)
mfcc = librosa.feature.mfcc(y,sr)
distanceTest = []
for i in range(len(mots)):
    y1,sr1 = librosa.load(dirname+"/"+mots[i])
    mfcc1 = librosa.feature.mfcc(y1,sr1)
    dist,_ = dtw(mfcc.T,mfcc1.T,ord=1))
    distanceTest.append(dist)

#result
pre = classifier.predict([distanceTest])[0]