在android中为pytorch预处理视频

如何解决在android中为pytorch预处理视频

在 Android Kotlin 中预处理视频数据以准备输入 PyTorch Android 模型的最佳方法是什么？具体来说，我在 PyTorch 中有一个现成的模型，我已经将它转换为准备好 PyTorch Mobile。

在训练过程中，模型从手机中获取原始素材并预处理为 (1) 灰度，(2) 压缩为我指定的特定较小分辨率，(3) 转换为张量以输入神经网络（或可能将压缩视频发送到远程服务器）。我为此使用 OpenCV，但我想知道在 Android Kotlin 中执行此操作的最简单方法是什么？

Python 代码供参考：


def save_video(filename):

    frames = []

    cap = cv2.VideoCapture(filename)
    frameCount = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    frameWidth = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frameHeight = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    buf_c = np.empty((frameCount,frameHeight,frameWidth,3),np.dtype('uint8'))
    buf = np.empty((frameCount,frameWidth),np.dtype('uint8'))

    fc = 0
    ret = True

    # 9:16 ratio
    width = 121
    height = 216
    dim = (width,height)

    # Loop until the end of the video
    while fc < frameCount and ret:
        ret,buf_c[fc] = cap.read()

        # convert to greyscale
        buf[fc] = cv2.cvtColor(buf_c[fc],cv2.COLOR_BGR2GRAY)

        # reduce resolution
        resized = cv2.resize(buf[fc],dim,interpolation = cv2.INTER_AREA)

        frames.append(resized)
        fc += 1

    # release the video capture object
    cap.release()

    # Closes all the windows currently opened.
    cv2.destroyAllWindows()

    return frames

解决方法

您说您的模型已转换为可用于 PyTorch Mobile，因此我假设您使用 TorchScript 编写了模型脚本。

借助 TorchScript，您可以使用 Torch 操作编写预处理逻辑并将其保存在脚本模型中，如下所示：

import torch
import torch.nn.functional as F

@torch.jit.script_method
def preprocess(self,image: torch.Tensor,# This should have format HxWx3
               height: int,width: int) -> torch.Tensor:
    img = image.to(self.device)

    # (1) Convert to Grayscale
    img = ((img[:,:,0] + img[:,1] + img[:,2]) / 3).unsqueeze(-1)

    # (2) Resize to specified resolution
    # Mimic torchvision.transforms.ToTensor to use interpolate
    img = img.float()
    img = img.permute(2,1).unsqueeze(0)
    img = F.interpolate(img,size=(
        height,width),mode="bicubic",align_corners=False)
    img = img.squeeze(0).permute(1,2,0)
    # Then turn it back to normal image tensor

    # (3) Other normalization like mean substraction and convert to BxCxHxW format
    img -= self.mean_tensor  # mean substraction
    img = img.permute(2,1).unsqueeze(0)
    return img

所以所有的预处理都将由 libtorch 完成，而不是 opencv。