Python:如何通过水平线交叉OCR字符

时间:2016-12-14 18:37:08

标签: python opencv ocr

我有一批想要扫描的图像。他们中的一些人有一条水平线穿过必须扫描的字符,如下所示:

Raw Image

我制作了一个能够删除水平线的程序:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Flot Examples: Real-time updates</title>
    <link href="http://www.flotcharts.org/flot/examples/examples.css" rel="stylesheet" type="text/css">
    <!--[if lte IE 8]><script language="javascript" type="text/javascript" src="../../excanvas.min.js"></script><![endif]-->
    <script language="javascript" type="text/javascript" src="http://www.flotcharts.org/flot/jquery.js"></script>
    <script language="javascript" type="text/javascript" src="http://www.flotcharts.org/flot/jquery.flot.js"></script>
    <script language="javascript" type="text/javascript" src="http://www.flotcharts.org/flot/jquery.flot.navigate.js"></script>




    <script type="text/javascript">

    $(function() {

        // We use an inline data source in the example, usually data would
        // be fetched from a server

        var data = [],
            totalPoints = 300;
            var xx = 0;

        function getRandomData() {

            if (data.length > 0)
                data = data.slice(1);
            // Do a random walk
            while (data.length < totalPoints) {

                var prev = data.length > 0 ? data[data.length - 1] : 50,
                y = prev + Math.random() * 10 - 5;

                if (y < 0) { y = 0; } else if (y > 100) { y = 100; }
                data.push(y);
            }

            // Zip the generated y values with the x values
            var res = [];
            for (var i = 0; i < data.length; ++i) {
                res.push([i, data[i]]); ++xx;
            }

            return res;
        }

        // Set up the control widget

        var updateInterval = 500;


        var plot = $.plot("#placeholder", [ getRandomData() ], {
            series: {
                shadowSize: 0   // Drawing is faster without shadows
            },
            yaxis: {min: 0,max: 100},
            xaxis: {
                min: 100,max: 200,
                zoomRange: [0, 300],
                panRange: [0, 300]
            },
            yaxis: {
                zoomRange: [0, 100], //minimo valor del y data, máximo valor
                panRange: [0, 100]
            },
            crosshair: {
                mode: "xy"
            },          
            zoom: { interactive: true},
            pan: {interactive: true,cursor: "crosshair"}
        });

        function update() {

            plot.setData([getRandomData()]);

            // Since the axes don't change, we don't need to call plot.setupGrid()

            plot.draw();
            setTimeout(update, updateInterval);
        }

        update();

    });

    </script>
</head>
<body>

        <div class="demo-container">
            <div id="placeholder" class="demo-placeholder"></div>
        </div>

</body>
</html>

这将返回以下图片:

Clean Image

那么,您是否知道如何对这些穿过白线的角色进行OCR?你会采用与所述方法不同的方法吗?

如果不清楚,请提出任何问题。谢谢。

1 个答案:

答案 0 :(得分:1)

按照@Rethunk的建议,我做了以下事情:

# Line parameters
minLineLength = 100
maxLineGap = 10
color = 255
size = 1

# Substracts the black line
lines = cv2.HoughLinesP(im_wb,1,np.pi/180,minLineLength,maxLineGap)[0]

# Makes a list of the y's located at position x0 and x1
y0_list = []
y1_list = []
for x0,y0,x1,y1 in lines:
    if x0 == 0:
        y0_list.append(y0)
    if x1 == im_wb.shape[1]:
        y1_list.append(y1)

# Calculates line thickness and its half
thick = max(len(y0_list), len(y1_list))
hthick = int(thick/2)

# Initial and ending point of the full line
x0, x1, y0, y1 = (0, im_wb.shape[1], sum(y0_list)/len(y0_list), sum(y1_list)/len(y1_list))

# Iterates all x's and prints makes a vertical line with the desired thickness 
# when the point is surrounded by white pixels
for x in range(x1):
    y = int(x*(y1-y0)/x1) + y0
    if im_wb[y+hthick+1, x] == 0 and im_wb[y-hthick-1, x] == 0:
        cv2.line(img,(x,y-hthick),(x,y+hthick),colour,size) 

cv2.imshow(clean', img)

因此,当HoughLinesP函数返回水平线的初始和最终点时,我列出了图像开始和结束点的y坐标。我能够知道全线方程(所以如果倾斜也是有效的)我可以迭代它的所有点。对于每个点,如果它被白色像素包围,我将其删除。结果如下:

enter image description here

如果您有任何更好的主意,请告诉我们!