在非常大的图像数据集中,我们有一些损坏的图像,如下图所示。可以毫无问题地查看这些图像,但人眼可以看到灰色的一些损坏区域。如何检测这些损坏的图像?
实际上我已经在Matlab中编写了一个用于检测的脚本。它可以过滤大部分损坏的图像,但有些会被遗漏。我的脚本的主要思想是找到损坏图像的常见二进制字符串。虽然一些损坏的图像不能获得这个常见的二进制字符串。所以他们不会被过滤。
我的Matlab代码:
FOLDER1 = './'; % query data
query_folder1 = FOLDER1;
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number
BROKEN_MARK = '00455114';
SIZE = 4; % single size
THRESH = 3;
for i = 1:nFile1
img_dir = strcat(FOLDER1, query_pt1(i).name());
fid = fopen(img_dir);
im1_stats = dir(img_dir);
file_length = im1_stats.bytes;
pos = -4;
epost = -200;
count = 0;
while abs(pos) <= ceil(file_length)
fseek(fid, pos, 'eof');
temp = fread(fid, 1, 'single');
str = num2hex(single(temp));
if(strcmp(str, BROKEN_MARK))
%fprintf('%s\n', img_dir);
if(count >= THRESH)
copyfile(img_dir, 'candidates/');
break;
else
count = count + 1;
end
else
count = 0;
pos = pos - 1;
end
end
fclose(fid);
end
任何人都可以提供一些检测所有损坏图像的建议吗?或者任何Python,C ++,Matlab或bash脚本代码。谢谢。
答案 0 :(得分:0)
如果您可以查看它们,则它们在技术上不会损坏。他们被腐蚀了#34;在你的感知。就个人而言,我会计算所有灰色像素,如果百分比大于给定的数量,图像将被视为“#34;损坏&#34;”,手动检查并删除。
#!/usr/bin/python
# -*- coding: utf-8 -*-
persantage=10 # Corrupted area in per cent to detect
color=(128, 128, 128) # Corruption color, tuple
from PIL import Image
im = Image.open("corrupted.jpg")
pixels = list(im.getdata())
width, height = im.size
pixels = [pixels[i * width:(i + 1) * width] for i in xrange(height)]
gray=0
other=0
for data in pixels:
for pix in data:
if pix == color:
gray += 1
else:
other += 1
corruption_area= gray *100 / (gray+other)
if corruption_area >= persantage:
print 'Corruption:', corruption_area, '%'
else:
print 'OK'
下面附件图片的输出是
Corruption: 18 %
答案 1 :(得分:0)
我使用已检测到损坏的图像来分析损坏的部分。然后使用这些结果来检测可能的损坏。这是Matlab代码:
clear; clc;
FOLDER1 = './';
query_folder1 = FOLDER1; % Some corrupted samples are here.
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number
OFF = 10;
for i = 1:nFile1
img_dir = strcat(FOLDER1, query_pt1(i).name());
img = imread(img_dir);
[x y ~] = size(img);
img_part = img(x-OFF:x, y-OFF:y, :); % Get samples from right-bottom corner
hist(i, :) = rgbhist_fast(img_part, 4); % get RGB histogram of sample part of corrupted region
end
mean_hist = mean(hist); % Use average of RGB histogram of samples for standard of corruption
FOLDER2 = '~/data/logo_data/930k_iautocrop/'; % Main big dataset
query_folder2 = FOLDER2;
query_pt2 = dir(strcat(query_folder2, '*.jpg'));
nFile2 = length(query_pt2); % file number
for i = 1:nFile2
if(mod(i, 100) == 0)
fprintf('%d\n', i);
end
img_dir = strcat(FOLDER2, query_pt2(i).name());
img = imread(img_dir);
[x y ~] = size(img);
img_part = img(x-OFF:x, y-OFF:y, :);
temp_hist = rgbhist_fast(img_part, 4);
dist(i) = sqrt(sum((mean_hist - temp_hist').^2, 2)); % get corrupted similarity
%imshow(img_part);
end
[v ix] = sort(dist, 'ascend'); % To find most corrupted images. The images on the top of the list have high corruption probability