Question

我有大量的行片段，例如：

借助一些 OpenCV 魔法（我仍在尝试了解 OpenCV 的工作原理），我可以在没有空白画布的情况下获得字符的轮廓：

import cv2
import numpy as np
import matplotlib.pyplot as plt

img=cv2.imread(example)
imgGray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

empty = np.zeros(img.shape[0:2])

ret,imgThresh = cv2.threshold(imgGray,249,250,cv.THRESH_OTSU)
kernel_erosion = np.ones((5,5),np.uint8)
imgErode = cv2.erode(imgThresh,kernel_erosion,iterations = 2)

kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT,(1,3))
imgOpen = cv2.morphologyEx(imgErode, cv2.MORPH_OPEN, kernel_open)

kernel_dilate = np.ones((1,1),np.uint8)
imgDilate = cv2.dilate(imgOpen,kernel_dilate,iterations = 4)

contours,_ = cv2.findContours(imgDilate, cv.RETR_TREE, cv.CHAIN_APPROX_NONE )

new_img = cv.drawContours(empty, contours, -1, (255,255,255), thickness=cv2.FILLED)
plt.imshow(new_img)
plt.show()

以空格分隔的线条对于人眼来说是清晰可辨的。我正在寻找一种直接的方法来选择不同的轮廓簇（即线条），方法是选择线条之间的空白区域，或者通过将彼此足够接近的轮廓聚类和同一行。

只计算每行像素的实用统计方法似乎不够稳健，因为线条可能会倾斜。

关于如何分割行片段的任何想法将不胜感激！

Answer 1

这是一个可能的解决方案。想法是将图像reduce 放到列，其中所有行值是所有行中每个强度值的总和。文本之间的区域应显示较低的值，为我们提供空行的（近似）位置。这些是步骤：

将图像转换为灰度
通过 Otsu 的阈值获取二进制图像
减少图像到包含每个图像行的所有总和的列
设置阈值并找到最小总和值
我们希望有多个最小位置，因此我们将获得最小位置点的平均值
使用此信息在两个空白区域之间裁剪图像

让我们看看代码：

# imports:
import cv2
import numpy as np

# Set image path
imagePath = "D://opencvImages//"
imageName = "b6yZO.png"

# Read image in Grasycale Mode:
inputImage = cv2.imread(imagePath + imageName, cv2.IMREAD_GRAYSCALE)

# Convert Grayscale to BGR:
inputImage = cv2.cvtColor(inputImage, cv2.COLOR_GRAY2BGR)

# Store a copy for results:
inputCopy = inputImage.copy()

# Convert BGR back to grayscale:
grayInput = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayInput, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

现在，您的图像似乎已经是灰度的（请注意，根据您用于处理图像的库类型，这可能会给您带来麻烦）。我以 GRAYSCALE 模式加载图像。现在，我将其转换为 BGR 以获得彩色副本（以便我可以稍后绘制一些结果）并将其转换回灰度以继续处理。我得到的二进制图像是这样的：

没什么特别的，只是一个普通的二进制图像。请注意，黑色在空白段落中的所有行中相对恒定。好的，接下来，使用 SUM 模式将图像缩小为一列：

# Reduce the ROI to a n row x 1 columns matrix:
reducedImg = cv2.reduce(binaryImage, 1, cv2.REDUCE_SUM, dtype=cv2.CV_32S)

这会缩小图像。注意数据类型。这个特定的操作产生 32 bit signed integers 来存储所有行的总和。现在，让我们尝试获得最小面积。会有多个，因为原件有多个像素构成空白区域。我将获取列的最大值并将 threshold 设置为该值的一部分：

# Get the maximum element from the reduced image array:
maxElement = np.amax(reducedImg)

# Define a threshold and accumulate
# the coordinate of the points:
threshValue = 0.1 * maxElement

# Get the height (or lenght) of the array:
reducedHeight = reducedImg.shape[0]

# We will store the Y coordinate here:
Y = []

我已将阈值设置为最大总和值的 10%。另外，我在遍历图像之前准备了一些变量。 reducedHeight 是数组的长度，Y 是一个列表，用于存储低于阈值的所有坐标。让我们遍历数组：

# Search for Y coordinates lower
# than the threshold:
for i in range(reducedHeight):
    # Get current value from column:
    currentValue = reducedImg[i]
    # Check out if the value is below the threshold:
    if currentValue < threshValue:
        # Store the value:
        Y.append(i)

不错。我们将所有需要的点存储在 Y 中。如果我们将这些点绘制为线条，我们可以将代表每个段落的线条集群可视化。这是图片：

现在，由于有多条线，我们需要一个平均值。实际上，Y中有两个簇，每个簇代表一个段落，我们在图像中有两个段落。有多种方法可以做到这一点，但归根结底，我们需要两个平均值。看起来像是 K-Means 的工作，因为它正是它所做的 - 它接收数据，对数据进行聚类，并返回所述聚类的平均中心。让我们将 K-Means 应用于我们的 Y 数组。但首先，数组需要一些数据处理：

# Reshape the array for K-means
Y = np.array(Y)
Y = Y.reshape(-1,1)

# K-means operates on 32-bit float data:
floatPoints = np.float32(Y)

# Set the convergence criteria and call K-means:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(floatPoints, 2, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)

# Print the centers:
print(center)

输出：

[[24.000002]
 [95.      ]]

是的，那些是我们的中心。 cluster 1 的一个值（第 1 段）和 cluster 2 的另一个值（第 2 段）。 K-means 很酷的一点是可以有更多的段落——它们将被 K-means 聚类，并且它总是会返回所述聚类的适当中心。好的，让我们使用我们的新信息来检查我们的两行：

# Draw the average lines:
for p in range(len(center)):

    # Get line points:
    x1 = 0
    y1 = int(center[p][0])
    x2 = int(inputCopy.shape[1])
    y2 = y1

    cv2.line(inputCopy, (x1, y1), (x2, y2), (0, 255, 0), 1)
    cv2.imshow("Lines", inputCopy)
    cv2.waitKey(0)

这些是中心（绿色 - 线条在那里，但图像太小看不到它们）：

我们终于可以用这个信息裁剪图像了：

# Crop image:
x = 0
y = int(center[0][0])
w = inputCopy.shape[1]
h = int(center[1][0])

imgCrop = inputImage[y:h,x:w]
cv2.imshow("imgCrop", imgCrop)
cv2.waitKey(0)

产生的结果：

由空格分隔的聚类轮廓（线段分割）

1 个答案: