我一直试图将每周菜单pdf分开并将其分成网格框以进行裁剪,然后将每个菜单分别用TesseractOCR进行OCR。
我已经看过lineJunctions这可能会对您有所帮助,但无法在imagemagick php文档中找到它们。我也见过Hough Lines in a similar stackoverflow question,但又无法在php文档中找到它们。
//read the image
$im = new Imagick();
$im->readimage('menu.png');
//resize and contrast
$im->resizeImage($im->getImageWidth()/6, $im->getImageHeight()/6 , 9, 1);
$im->thresholdImage( 0.65 * Imagick::getQuantum() );;
//remove "noise"
//this is done by creating two new images where only horizontal lines, then vertical are preserved using morphology and then combined into one
$horizontalLines = clone $im;
$verticalLines = clone $im;
$horizontalLineKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_RECTANGLE, "19x1");
$horizontalLines->morphology(\Imagick::MORPHOLOGY_CLOSE, 1, $horizontalLineKernel);
$verticalLineKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_RECTANGLE, "1x15");
$verticalLines->morphology(\Imagick::MORPHOLOGY_CLOSE, 1, $verticalLineKernel);
$horizontalLines->compositeimage($verticalLines, 5, 0, 0);
$im = clone $horizontal;
$horizontalLines->clear();
$horizontalLines->destroy();
$verticalLines->clear();
$verticalLines->destroy();
// Create boxes at corners
// These are at points from which I intent to create the individual grid boxes
$plusKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_PLUS, "4");
$im->morphology(\Imagick::MORPHOLOGY_OPEN, 1, $plusKernel);
$squareKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_SQUARE, "2");
$im->morphology(\Imagick::MORPHOLOGY_CLOSE, 1, $squareKernel);
通过这样做,我最终得到一个带有盒子的图像,如果我能得到x,y,宽度和高度,我应该能够获得坐标,但是它错过了右下角并且非常混乱。我确信必须有更好的方法。
图像缩小,然后我计划将坐标升高6,如$im->resizeImage()
所示。我还有更好的方法吗?
答案 0 :(得分:1)
在ImageMagick中执行此操作的一种方法(假设线条是水平和垂直的)是缩放到一行和一列,阈值和滤镜txt:黑色像素的输出。
xlist=`convert cells.png -scale x1! -auto-level -threshold 27% -negate -morphology Thinning:-1 Skeleton -negate txt:- | grep "black" | cut -d, -f1`
echo "$xlist"
38
109
180
251
322
394
465
536
ylist=`convert cells.png -scale 1x! -auto-level -threshold 27% -negate -morphology Thinning:-1 Skeleton -negate txt:- | grep "black" | cut -d: -f1 | cut -d, -f2`
echo "$ylist"
45
141
256
381
所有x值和所有y值的组合为您提供了交叉点阵列。
xArr=($xlist)
yArr=($ylist)
numx=${#xArr[*]}
numy=${#yArr[*]}
pointlist=""
for ((j=0; j<numy; j++)); do
for ((i=0; i<numx; i++)); do
pointlist="$pointlist ${xArr[$i]},${yArr[$j]}"
done
done
echo "pointlist=$pointlist"
pointlist= 38,45 109,45 180,45 251,45 322,45 394,45 465,45 536,45 38,141 109,141 180,141 251,141 322,141 394,141 465,141 536,141 38,256 109,256 180,256 251,256 322,256 394,256 465,256 536,256 38,381 109,381 180,381 251,381 322,381 394,381 465,381 536,381
您可以通过以下方式进行观察:
convert cells.png -scale x1! -scale 550x50! -auto-level -threshold 27% tmp1.png
convert cells.png -scale 1x! -scale 50x425! -auto-level -threshold 27% tmp2.png
没有细化,顶部水平线比一个像素厚。