在您自己的数据集上微调 SOTA 视频模型 - 手语

如何解决在您自己的数据集上微调 SOTA 视频模型 - 手语

我正在尝试使用 gluoncv API 实现符号分类器，作为我最后一年大学项目的一部分。

数据集：http://facundoq.github.io/datasets/lsa64/

我按照您自己的数据集教程中的微调 SOTA 视频模型进行了微调。教程：https://cv.gluon.ai/build/examples_action_recognition/finetune_custom.html

i3d_resnet50_v1_custom Accuracy Graph I3D
slowfast_4x16_resnet50_custom Accuracy Graph Slow Fast

绘制的图表显示了几乎 90% 的准确率，但是当我进行推理时，即使在我曾经训练过的视频上也出现分类错误。

所以我被卡住了，你能不能有一些指南来提供任何帮助。

谢谢

我的 I3D 数据加载器：

num_gpus = 1
ctx = [mx.gpu(i) for i in range(num_gpus)]
transform_train = video.VideoGroupTrainTransform(size=(224,224),scale_ratios=[1.0,0.8],mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225])
per_device_batch_size = 5
num_workers = 0
batch_size = per_device_batch_size * num_gpus

train_dataset = VideoClsCustom(root=os.path.expanduser('DataSet/train/'),setting=os.path.expanduser('DataSet/train/train.txt'),train=True,new_length=64,new_step=2,video_loader=True,use_decord=True,transform=transform_train)

print('Load %d training samples.' % len(train_dataset))
train_data = gluon.data.DataLoader(train_dataset,batch_size=batch_size,shuffle=True,num_workers=num_workers)

推理运行：

from gluoncv.utils.filesystem import try_import_decord
decord = try_import_decord()

video_fname = 'DataSet/test/006_001_001.mp4'
vr = decord.VideoReader(video_fname)
frame_id_list = range(0,64,2)
video_data = vr.get_batch(frame_id_list).asnumpy()
clip_input = [video_data[vid,:,:] for vid,_ in enumerate(frame_id_list)]

transform_fn = video.VideoGroupValTransform(size=(224,0.225])
clip_input = transform_fn(clip_input)
clip_input = np.stack(clip_input,axis=0)
clip_input = clip_input.reshape((-1,) + (32,3,224,224))
clip_input = np.transpose(clip_input,(0,2,1,4))
print('Video data is readed and preprocessed.')

# Running the prediction
pred = net(nd.array(clip_input,ctx = mx.gpu(0)))
topK = 5
ind = nd.topk(pred,k=topK)[0].astype('int')
print('The input video clip is classified to be')
for i in range(topK):
    print('\t[%s],with probability %.3f.'%
          (CLASS_MAP[ind[i].asscalar()],nd.softmax(pred)[0][ind[i]].asscalar()))

解决方法

我发现了我的错误，这是由于增强较少而发生的，所以我改变了训练数据加载器和推理的转换，如下所示，现在它可以正常工作了。

transform_train = transforms.Compose([
    # Fix the input video frames size as 256×340 and randomly sample the cropping width and height from
    # {256,224,192,168}. After that,resize the cropped regions to 224 × 224.
    video.VideoMultiScaleCrop(size=(224,224),scale_ratios=[1.0,0.875,0.75,0.66]),# Randomly flip the video frames horizontally
    video.VideoRandomHorizontalFlip(),# Transpose the video frames from height*width*num_channels to num_channels*height*width
    # and map values from [0,255] to [0,1]
    video.VideoToTensor(),# Normalize the video frames with mean and standard deviation calculated across all images
    video.VideoNormalize([0.485,0.456,0.406],[0.229,0.224,0.225])
])

在您自己的数据集上微调 SOTA 视频模型 - 手语

如何解决在您自己的数据集上微调 SOTA 视频模型 - 手语

解决方法

相关推荐