Question

我的应用程序的目的是检测人类。如果我正在加载YOLOv2权重和配置，则一切正常。如果我正在加载YOLOv3权重和配置，则net在所有边界框上的所有类置信度上返回0。我从https://pjreddie.com/darknet/yolo/尝试了正常的YOLOv3-416和YOLOv3-tiny。据我所知，YOLOv2和YOLOv3上要求的输入和输出是相同的。请帮助我找出YOLOv3无法正常工作的地方。我正在使用OpenCV 4.01和Java包装器。我只使用CPU。我试图找到类似的问题，但没有发现类似的问题。

public class YoloAnalizer {
private Net net;
private StopWatch stopWatch = new StopWatch();
private Logger logger = LogManager.getLogger();

private final double threshold = 0.5;
private final double scaleFactor = 1.0 / 255.000;
private final Size imageSize = new Size(416, 416);
private final Scalar mean = new Scalar(0,0,0);
private final boolean swapRB = true;
private final boolean crop = false;

private final String[] classes = new String[] {"person", "bicycle", "car", "motorcycle",
                                             "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant",
                                             "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse",
                                             "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack",
                                             "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis",
                                             "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard",
                                             "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife",
                                             "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog",
                                             "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table",
                                             "toilet", "tv", "laptop", "mouse", "remote", "keyboard",
                                             "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
                                 "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"};

public YoloAnalizer(String pathToYoloDarknetConfig, String pathToYoloDarknetWeights) {
    net = Dnn.readNetFromDarknet(pathToYoloDarknetConfig, pathToYoloDarknetWeights);
}

public List<Rect> AnalizeImage(Mat image) {
    logger.debug("Starting analisic image using yolo");
    stopWatch.StartTime();
    Mat blob = Dnn.blobFromImage(image, scaleFactor, imageSize, mean, swapRB, crop);
    net.setInput(blob);

    Mat prediction = net.forward();
    List<Rect> rects = ConvertPredictionToRoundingBox(prediction, image);
    logger.debug(String.format("Analising frame took: %s", stopWatch.GetElapsedMiliseconds()));
    return rects;
}

private List<Rect> ConvertPredictionToRoundingBox(Mat prediction, Mat image) {
    List<Rect> listOfPredictedObjects = new ArrayList<>();
    for (int i = 0; i < prediction.size().height; i++) {
        float[] row = new float[85];
        prediction.get(i, 0, row);

        float confidenceOnBox = row[4];
        int predictedClassConfidence = getTableIndexWithMaxValue(row, 5);
        double score = confidenceOnBox * row[predictedClassConfidence];
        if (score > threshold) {
            double x_center   = row[0] * image.width();
            double y_center   = row[1] * image.height();
            double width = row[2] * image.width();
            double height = row[3] * image.height();

            double left  = x_center - width * 0.5;
            double top  = y_center - height * 0.5;

            listOfPredictedObjects.add(new Rect((int)left, (int)top, (int)width, (int)height));
            logger.info(String.format("Found %s(%s) with confidence %s", classes[predictedClassConfidence-5],predictedClassConfidence, score));
        }
    }
    return listOfPredictedObjects;
}

private int getTableIndexWithMaxValue(float[] array, int startFrom) {
    double maxValue = -1;
    int maxIndex = -1;
    for (int i = startFrom; i < array.length; i++) {
        if (maxValue < array[i]) {
            maxIndex = i;
            maxValue = array[i];
        }
    }
    return maxIndex;
}

}

Answer 1

这是我在v3中发现的内容：

在功能fill_truth_region中：

真值表的创建格式为“ 1-classes-x-y-w-h”，即真值表中的每个条目均为1+类数+4。

但是在forward_yolo_layer函数中，获取框真值似乎是必需的 x，y，w，h从条目的开头开始，如果存在条目，则x的值似乎为1，然后将类的部分放入y，w，h。

我想如果您在forward_yolo_layer中进行更改：

box truth=float_to_box(net.truth + t * 5 + b * l.truths, 1);

对此：

box truth=float_to_box(net.truth + t * (5+l.classes) + b * l.truths + l.classes+1, 1);

然后您将获得一个带有正确的x，y，w，h的真值框。

Yolov3什么也没检测到，但是Yolov2可以正常工作

1 个答案: