如何解决寻求有关移动设备近实时对象检测检测图像中的垃圾的建议目标数据集方法论到目前为止我的脚步我当前的结果我身边的问题

大家好，

当前，我正在尝试构建应在移动设备上运行的近实时对象检测模型。当我刚接触计算机视觉这一特定领域时，我将不胜感激，因为它对我当前的进展提出了各种建议，也对我可以为实现该目标而做出的不同努力提供了反馈。

目标

目标是检测图像中的垃圾并将其分类为以下处理方法之一（3个目标类）：

黄色麻袋/罐头（德语）
纸
玻璃

此外，该模型应该是轻量级的，以便可以在移动设备上有效地运行它。

数据集

我正在使用垃圾桶数据集，该数据集包含2527张图像，这些图像分布在以下类别中：玻璃，纸张，塑料，垃圾，硬纸板，金属。这里值得注意的是每个图像只有一个项目。另外，每张图像的背景都是相同的（纯白色）。

https://www.kaggle.com/asdasdasasdas/garbage-classification

方法论

坦白地说，我正在关注Sentdex的YouTube教程，内容涉及Mac'n'cheese检测和this中等级别的枪支检测。因此，我使用Google Colab作为我的环境。我也在尝试重新训练预训练的模型（ssd_mobilenet_v2_coco_2018_03_29）。使用tensorflow API提供的方法（model_main.py和export_inference_graph.py）训练模型并导出推理图。我为此模型使用了来自tensorflow的samples config。

到目前为止我的脚步

我已经根据之前提到的“中型”帖子设置了类似于Colab Notebook的Google Colab环境。
我将数据分别按3/4和1/4分为训练和测试数据。
我使用流行的labelImg工具标记了数据，以便每个对象都有一个边界框。
我删除了对象占据整个空间或超出图像范围的所有图像，因为边界框没有太大意义。
我创建了label_map，csv和tfrecord文件。
我尝试使用initial_learning_rate，框预测器和特征抽取器的l2_regularizer > weight比率，设置use_dropout=true并增加了batch_size=32。

我当前的结果

我建立的大多数模型的AP / AR都很差，损失也很大，而且往往过拟合。此外，该模型一次只能在新图像中检测到一个物体（也许是由于数据集？）。

这是我的张量板的一些屏幕截图。这些是在大约12k步之后完成的。我认为这也是过度拟合的开始，因为AP突然升高并且预测的图像具有90-100％左右的准确度。

标量：

预测的图像：

我身边的问题

每个图像中只有一个对象是否有问题？在视频流上运行模型时，这可能会引起问题吗？
这些图像足以建立准确的模型吗？
你们中有人在这方面有经验吗，可以给我一些有关如何微调预训练模型的建议吗？
我还通过网络摄像头在视频流上运行了该模型，但是所有模型都倾向于对整个屏幕进行分类。因此，似乎该模型正在检测一个对象，但在整个屏幕上绘制了边界框。这可能与数据集的性质/模型质量差有关吗？

这是一篇很长的文章，所以在此先感谢您抽出时间阅读本文。我希望我能够阐明我的目标，并为你们提供足够的细节，以跟上我目前的进展。

我非常感谢每一个反馈！

最诚挚的问候

Jannik

针对预训练的ssd_mobilenet_v2_coco_2018_03_29模型的当前调整配置：

model {
  ssd {
    num_classes: 3
    Box_coder {
      faster_rcnn_Box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      IoU_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    Box_predictor {
      convolutional_Box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        #use_dropout: false
        use_dropout: true
        dropout_keep_probability: 0.8
        kernel_size: 1
        Box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,regularizer {
            l2_regularizer {
              #weight: 0.00004
              weight: 0.001
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,scale: true,center: true,decay: 0.9997,epsilon: 0.001,}
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,regularizer {
          l2_regularizer {
            #weight: 0.00004
            weight: 0.001
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,}
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        IoU_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 3
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        IoU_threshold: 0.6
        max_detections_per_class: 1
        max_total_detections: 1
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 32
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.01
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "PATH"
  fine_tune_checkpoint_type:  "detection"
  # Note: The below line limits the training process to 200K steps,which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path:"PATH"
  }
  label_map_path: "PATH"
}

eval_config: {
  num_examples: 197
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  #max_evals: 10
  num_visualizations: 20
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "PATH"
  }
  label_map_path: "PATH"
  shuffle: false
  num_readers: 1
}

寻求有关移动设备近实时对象检测检测图像中的垃圾的建议 目标数据集方法论到目前为止我的脚步我当前的结果我身边的问题

如何解决寻求有关移动设备近实时对象检测检测图像中的垃圾的建议 目标数据集方法论到目前为止我的脚步我当前的结果我身边的问题

大家好，

目标

数据集

方法论

到目前为止我的脚步

我当前的结果

我身边的问题

寻求有关移动设备近实时对象检测检测图像中的垃圾的建议目标数据集方法论到目前为止我的脚步我当前的结果我身边的问题

如何解决寻求有关移动设备近实时对象检测检测图像中的垃圾的建议目标数据集方法论到目前为止我的脚步我当前的结果我身边的问题