Question

在非常大的图像数据集中，我们有一些损坏的图像，如下图所示。可以毫无问题地查看这些图像，但人眼可以看到灰色的一些损坏区域。如何检测这些损坏的图像？

实际上我已经在Matlab中编写了一个用于检测的脚本。它可以过滤大部分损坏的图像，但有些会被遗漏。我的脚本的主要思想是找到损坏图像的常见二进制字符串。虽然一些损坏的图像不能获得这个常见的二进制字符串。所以他们不会被过滤。

我的Matlab代码：

FOLDER1 = './'; % query data
query_folder1 = FOLDER1;
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number

BROKEN_MARK = '00455114';
SIZE = 4; % single size
THRESH = 3;

for i = 1:nFile1
    img_dir = strcat(FOLDER1, query_pt1(i).name());
    fid = fopen(img_dir);
    im1_stats = dir(img_dir);
    file_length = im1_stats.bytes;
    pos = -4;
    epost = -200;
    count = 0;
    while abs(pos) <= ceil(file_length)
        fseek(fid, pos, 'eof');
        temp = fread(fid, 1, 'single');
        str = num2hex(single(temp));
        if(strcmp(str, BROKEN_MARK))
            %fprintf('%s\n', img_dir);
            if(count >= THRESH)
                copyfile(img_dir, 'candidates/');
                break;
            else
                count = count + 1; 
            end
        else
            count = 0;
            pos = pos - 1;
        end
    end
    fclose(fid);
end

任何人都可以提供一些检测所有损坏图像的建议吗？或者任何Python，C ++，Matlab或bash脚本代码。谢谢。

Answer 1

如果您可以查看它们，则它们在技术上不会损坏。他们被腐蚀了＃34;在你的感知。就个人而言，我会计算所有灰色像素，如果百分比大于给定的数量，图像将被视为“＃34;损坏＆＃34;”，手动检查并删除。

#!/usr/bin/python
# -*- coding: utf-8 -*-

persantage=10           # Corrupted area in per cent to detect
color=(128, 128, 128)   # Corruption color, tuple 

from PIL import Image
im = Image.open("corrupted.jpg")

pixels = list(im.getdata())
width, height = im.size
pixels = [pixels[i * width:(i + 1) * width] for i in xrange(height)]

gray=0
other=0
for data in pixels:
    for pix in data:
        if pix == color:
            gray += 1
        else:
            other += 1


corruption_area= gray *100 / (gray+other)


if corruption_area >= persantage:   
    print 'Corruption:', corruption_area, '%'
else:
    print 'OK'

下面附件图片的输出是

Corruption: 18 %

enter image description here

Answer 2

我使用已检测到损坏的图像来分析损坏的部分。然后使用这些结果来检测可能的损坏。这是Matlab代码：

clear; clc;
FOLDER1 = './';
query_folder1 = FOLDER1; % Some corrupted samples are here.
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number

OFF = 10;

for i = 1:nFile1
    img_dir = strcat(FOLDER1, query_pt1(i).name());
    img = imread(img_dir);
    [x y ~] = size(img);
    img_part = img(x-OFF:x, y-OFF:y, :); % Get samples from right-bottom corner
    hist(i, :) = rgbhist_fast(img_part, 4); % get RGB histogram of sample part of corrupted region
end

mean_hist = mean(hist); % Use average of RGB histogram of samples for standard of corruption 

FOLDER2 = '~/data/logo_data/930k_iautocrop/'; % Main big dataset
query_folder2 = FOLDER2;
query_pt2 = dir(strcat(query_folder2, '*.jpg'));
nFile2 = length(query_pt2); % file number

for i = 1:nFile2
    if(mod(i, 100) == 0)
        fprintf('%d\n', i);
    end
    img_dir = strcat(FOLDER2, query_pt2(i).name());
    img = imread(img_dir);
    [x y ~] = size(img);
    img_part = img(x-OFF:x, y-OFF:y, :);
    temp_hist = rgbhist_fast(img_part, 4);
    dist(i) = sqrt(sum((mean_hist - temp_hist').^2, 2)); % get corrupted similarity
    %imshow(img_part);
end

[v ix] = sort(dist, 'ascend'); % To find most corrupted images. The images on the top of the list have high corruption probability

如何以Jpg / Jpeg格式检测部分损坏的图像

2 个答案: