使用tensorflow 1.15和Python 3.6.7检测小对象的问题

如何解决使用tensorflow 1.15和Python 3.6.7检测小对象的问题

我正在尝试使用张量流（a-z，A-Z，0-9和变音符号等）检测小的字母字符和数字（在分辨率为2000x1000的图像上，标签大小约为18x24像素）。我尝试了不同的预训练模型进行训练，但检测器从未在测试图像中找到单个物体。即使经过85000步，图像上也没有检测到任何东西。

训练图像和测试图像包括多达69个不同的类别（单个图像上每个类别的数量从1到30甚至更多不等）。我编写了一个脚本，在每个图像上使用了不同类别的脚本，将其复制了很多次并分离了各个类别，因此我得到了大量的图像（约6000张图像）用于训练。现在，每个图像都只标记了1个不同的类（每个图像上单个类的数量仍在1到30之间变化），并且所有图像都具有相同的分辨率（2000x1000）。

我还尝试在每个图像上剪切每个标记的类并将其保存在单个图像中，因此我得到了大约35000个小图像（大约18x24像素-分辨率根据字母字符的大小而有所不同）或数字），但经过70000步后，检测器仍未检测到任何东西。

当我使用faster_rcnn_inception_v2_pets模型时，总损失在我的训练中稳步下降，但有时会出现单一步骤，其中损失会爆炸，一段时间后，这些“爆炸性”损失的发生率会增加。

虽然使用faster_rcnn_inception_resnet_v2_atrous_coco模型，但是在开始训练后，损失直接在0.07和1.5之间跳跃，有时甚至会爆炸成千上万。

我还尝试了ssd_mobilenet_v2_coco预先训练的模型，但检测也没有成功。

我对标签map.pbtxt和generate_tfrecord.py文件进行了两次，三次和四次检查，它们看上去都很好。

即使我将min_score_thresh降低到0.1，检测器也可能永远找不到单个对象吗？

有没有人可以帮助我解决我的问题？

感谢您的阅读，非常感谢您的帮助！

我的设置是：

Tensorflow版本1.15-gpu

Python版本3.67

Num_Classes：69

训练数据集：6000张图像

预先训练的模型：faster_rcnn_inception_v2_pets fast_rcnn_inception_resnet_v2_atrous_coco ssd_mobilenet_v2_coco

faster_rcnn_inception_v2_pets的配置文件（另一个配置文件具有相同的配置）：

model {
  faster_rcnn {
    num_classes: 69
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 1000
        max_dimension: 2000
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.05,0.1,0.15,0.25,0.5,1.0,2.0]
        aspect_ratios: [0.25,2.0,2.5]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_Box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_IoU_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_Box_predictor {
      mask_rcnn_Box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        IoU_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: softmax
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    adam_optimizer: {
      learning_rate: {
       manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "C:/xxx/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true

  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}


train_input_reader: {
  tf_record_input_reader {
    input_path: "C:/xxx/object_detection/train.record"
  }
  label_map_path: "C:/xxx/object_detection/training/labelmap.pbtxt"
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 1101
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "C:/xxx/object_detection/test.record"
  }
  label_map_path: "C:/xxx/object_detection/training/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

解决方法

首先从较小的模型开始，例如ResNet-18或ResNext。

如果您要直接在高分辨率图像上进行训练，请先在较小尺寸的图像上进行训练。

尝试使用StepLR或CyclicLR。使用数据扩充。

但是，最重要的是，每次尝试一个步骤，然后在每个步骤中添加一件事。

并且由于您使用的是预先训练的模型... ，您是要分离头部，然后训练数据集中的最后一层吗？尝试从一开始就冻结大部分图层，并在需要时朝末端和头部解冻。但是，请尝试在经过预先训练的模型中分离当前的头部，然后在您的数据集中训练这些类。