Question

我已经在我的数据库中索引了PDF文件的文本，但有时文本不干净，我们之间有空格：

var text = 'C or P ora te go V ernan C e report M ANA g EMENT bO A r D AND s u PE r V is O r y bO A r D C OMM i TTEE s The Management Board has not currently established any committees.';

我想为我的用户制作一个前端搜索引擎，但我需要知道每次搜索的START和END位置（基于原始文本，带空格）。

我可以用正则表达式做到这一点，例如，如果我这样做：

text.toLowerCase().search(/m ? a ? n ? a ? g ? e ? m ? e ? n ? t/);

我在开始位置字母36上找到“管理”这个词。现在，我想知道单词的“结束位置”（因为我不知道单词上有多少空格，所以我不知道多少字母），我希望搜索是多重匹配（给我多个结果的开始/结束位置）。

你可以帮帮我吗？同样，对我来说，根据原始文本获取每个单词的开始/结束位置非常重要，删除空格然后搜索对我来说不是一个好的解决方案。

我也很想知道如果没有正则表达式我是否可以做到这一点。

谢谢！

Answer 1

此演示可能有所帮助：

> text.toLowerCase().match(/m *a *n *a *g *e *m *e *n *t/)
[ 'm ana g ement',
  index: 36,
  input: 'c or p ora te go v ernan c e report m ana g ement bo a r d and s u pe r v is o r y bo a r d c omm i ttee s the management board has not currently established any committees.' ]

（我修改了你的正则表达式，在每个字母之间使用' *'，以匹配任意数量的空格，包括0.你的' ? '示例只匹配每个字母之间的1或2个空格。）

如果正则表达式匹配，则使用.match方法返回返回捕获的表达式和索引（如上所示），否则返回null。您应该可以使用它来执行以下操作：

const matches = text.toLowerCase().match(/m *a *n *a *g *e *m *e *n *t/);
if (matches) {
    const start = matches.index;
    const end = matches.index + matches[0].length - 1;
}

搜索＆amp;在文本中匹配忽略空格

1 个答案: