MATLAB中的分词

时间:2016-09-12 11:02:29

标签: matlab image-processing matrix text-segmentation

代码:

    @JsonDeserialize(keyUsing = CustomKeyDeserializer.class)
    @JsonSerialize(keyUsing = CustomKeySerializer.class)
    private Map<User, TrainingRole> users = new HashMap<User, TrainingRole>();

问题:我首先从输入图像中分割一条线,然后从该线中提取出来的字。我的算法从第一行开始正确地分割了单词(来自输入图像,附在这篇文章中),然后它也会对第二行中的第一个单词进行分段,而不是它给出了以下错误:

img = imread ('G:\Stuff\RP\Database\0001_4.jpg');
%imshow(img);

bin_img = imcomplement(im2bw(img, 0.8)); %Binarizing
%figure;
%imshow(bin_img);

bin_img = bwareaopen(bin_img, 50); %for removing dots and commas

%%%%%%%% Line Segmentation %%%%%%%%
dbw_img = imdilate(bin_img, strel('line', 100, 0));%Dilating
[L, N]=bwlabel(dbw_img); %finding connected components
bbox = regionprops(L, 'BoundingBox');
lineSlopeMatrix=[N 0];
for i=1:N %must run for all the lines in an image
    bBox=bbox(i).BoundingBox;
    x=bBox(1)+0.5;
    y=bBox(2)+0.5;
    w=bBox(3);
    h=bBox(4);
    linePatch=bin_img(y:y+h,:); %Extracting line
    figure,imshow(linePatch) % Prints lines

    words_img = imdilate(linePatch, strel('line', 40, 0));%Dilating
    [R, C]=bwlabel(words_img); %finding connected components i.e. Words
    bounding = regionprops(R, 'BoundingBox');
    for j=1:C %nmust run for the words in a line
        bdBox=bounding(j).BoundingBox;
        xAxis=bdBox(1)+0.5;
        yAxis=bdBox(2)+0.5;
        width=bdBox(3);
        height=bdBox(4);
        %         [row col]=size(linePatch)
        %         yAxis,yAxis+height,xAxis,xAxis+width
        Patch=linePatch(yAxis:yAxis+height,xAxis:xAxis+width); 
        %Extracting Patch of Words

        figure,imshow(Patch) %Prints words
        Patch=[];
    end
    linePatch=[];
end

我很容易理解错误,检查矩阵的尺寸,它们看起来很好或者我找不到那里的问题..

看到附带的图片: Segmentation of Words from Line 1 // Segmentation of Words from Line 2

请告诉我这个错误的正确原因并建议修复。 谢谢:))

1 个答案:

答案 0 :(得分:0)

请参阅以下文档,了解您正在使用的函数regionprops

http://de.mathworks.com/help/images/ref/regionprops.html#inputarg_properties

这里你可以阅读属性Bounding Box的描述,返回值(你的xAx是一个yAxis)在[xyz]中指定边界框的左上角角...]。 所以添加 height来获取边界框是错误的。您必须减去height