Question

我试图在模式中考虑类似文本来进行preg_replace。我的目标是从OCR软件输出的文本中删除给定的字符串（某些字母可能会混淆）。

让我们举个代码示例：

$ocr = 'Appartamento sito in Vioolo San Vincenzo, n.4 e censito al ;
preg_replace('#\bVicolo San Vincenzo[, ]+([0-9]+|n[\.]? ?[0-9]+)?\b#', '<removed text>', $ocr);

注意： OCR将第三个字母c与o 混淆。

此处不能提供或不可能改善OCR。

输入字符串：

Appartamento sito in Vioolo San Vincenzo，n.4 e censito al

上述调用preg_replace之后的预期结果：

Appartamento sito in e censito al

实际结果：

Appartamento sito in Vioolo San Vincenzo，n.4 e censito al

文本在PHP函数的含义中应该被视为类似于levenshtein()，similar_texts()（虽然我不考虑soundex()或metaphone()因为文本不是＆＃ 39; t in English language）。

使用preg_replace不是强制性的，但我至少需要能够根据与该模式等效的内容来评估字符串。

preg_replace会计相似的文本

0 个答案: