Question

需要堆栈溢出的强大帮助。我实际上工作的应用程序必须通过OCR（我使用tesseract）文档进行分析，并提取我可以从中获得的所有文本。以下是图像类型的示例：

这是我在预处理上做的事情，以摆脱所有的线条。将来我也可能需要分别分析每个“矩形”（将给定线条定义的区域送到tesseract）所以我猜有比这更简单的方法，但我不会有“线”坐标。

package formRecog;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Point;
import org.opencv.core.Scalar;
import org.opencv.core.Size;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.imgproc.Imgproc;
import static org.opencv.core.Core.bitwise_not;
import org.opencv.core.MatOfPoint;


public class testMat {

    public static void main(String[] args) {

        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

        Mat source  = Imgcodecs.imread("./image.png",Imgcodecs.CV_LOAD_IMAGE_ANYCOLOR);
        Mat destination  = new Mat(source.rows(), source.cols(), source.type());
        Imgproc.cvtColor(source, destination, Imgproc.COLOR_RGB2GRAY);  
        Imgcodecs.imwrite("gray.jpg", destination);

        Imgproc.GaussianBlur(destination, destination, new Size(3, 3), 0, 0, Core.BORDER_DEFAULT);  

        Imgproc.Canny(destination, destination, 30, 90);
        Imgcodecs.imwrite("postcanny.jpg", destination);

        Mat houghlines = new Mat(); 
        Imgproc.HoughLinesP(destination, houghlines, 1, Math.PI / 180,  250, 185,5);

        //DESSINER LES LIGNES
        Mat result = new Mat(source.rows(), source.cols(), source.type());
        for (int i = 0; i < houghlines.rows(); i++) {
            double[] val = houghlines.get(i, 0);
            Imgproc.line(destination, new Point(val[0], val[1]), new Point(val[2], val[3]), new Scalar(0, 0, 255), 5);
            Imgproc.line(result, new Point(val[0], val[1]), new Point(val[2], val[3]), new Scalar(0, 0, 255),5);
        }

        Imgcodecs.imwrite("lines.jpg", result);

        Mat contourImg = new Mat(source.rows(), source.cols(), source.type());
        List<MatOfPoint> contours = new ArrayList<MatOfPoint>();
        Mat hierarchy = new Mat();
        //Point offset = new Point();

        Imgproc.findContours(destination, contours, hierarchy, Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_NONE );
        Imgproc.drawContours(contourImg, contours, -1, new Scalar(255, 0, 0),-1);

        Imgcodecs.imwrite("contour.jpg", contourImg);

        bitwise_not(destination,destination);


        Imgcodecs.imwrite("final.jpg", destination);

    }
}

这是最终图片

Final image after processing

问题是，tesseract没有读到任何关于此的内容：

11mËEZË@ÜDS@ 7 C @mpû@515îf@ 5 @ ??ûäû ©m m @@@ @@vësw?? a？ PF©@MÜGS@“@ X @Ü©ÜÎÊQÜ©IÏÙ1111 175515

我得到的第一条“线”。

我认为这是因为字母不再“填充”而且tesseract无法读取它们，因为tesseract实际上给了我相当好的结果，但删除方法的行不好。我想用黑色填充这些字母但是

Imgproc.drawContours（contourImg，contours，-1，new Scalar（255,0,0）， - 1）;

没有做任何事情，虽然我很确定findContours工作得很好，如果我把它的结果写成我得到的图像就像以前一样。

我搜索过类似的问题 cv2.drawContours will not draw filled contour 和 Contour shows dots rather than a curve when retrieving it from the list, but shows the curve otherwise 但没有发现任何我可以使用的东西（也许没有得到它）。

就是你知道的，我开始编写像九月这样的课程，所以我对这件事情很陌生（请原谅我，如果这里有一些可怕的东西），但我没有选择这个问题我是致力于：）

我希望自己足够清楚，英语也不错。

我的谢意。

编辑：感谢Rick.M它变得越来越好，在findcontours中使用CHAIN_APPROX_SIMPLE并在drawcontours中通过ldx进行迭代。 New final

有没有办法改善这个结果？我猜测tesseract不会吃这个吗？感谢

上传postcanny图片：Image after canny

Answer 1

@method_decorator([vary_on_cookie, cache_page(900)], name='dispatch') class SomeClass(View): ...没有按要求工作的原因是标志：CHAIN_APPROX_NONE 绝对存储所有轮廓点。因此，使用{em>压缩水平，垂直和对角线段的CHAIN_APPROX_SIMPLE并仅留下它们的端点可以为您提供完成的轮廓。在这种情况下，您也可以使用drawContours而不使用循环，并且应该可以正常工作。

现在，对于评论中的讨论，Canny图像看起来不错，但正如您在缩放后看到的那样，Imgproc.drawContours(contourImg, contours, -1, new Scalar(255, 0, 0),-1);未检测到的字母未完全连接。我建议使用erosion和一个小内核（你必须使用参数）来获得更好的结果。

drawContours没有填充/ JAVA - OpenCV

1 个答案: