Question

这是一个算法问题;任何伪代码/口头解释都会做（尽管Python解决方案大纲是理想的）。

我们有一个查询字A，例如pity。我们还有一组其他字符串B，每个字符串由一个或多个以空格分隔的单词组成：pious teddy，piston tank yard，{{1} }，pesky industrial strength tylenol等。

目标是识别可以构建oh pity is me!的字符串B 。在这里＆＃34;构建＆＃34;表示我们可以按顺序在A中取一个或多个单词的前缀，并将它们连接在一起以获得B 。

示例：

pity = pi ston t ank y ard
pity = p esky i ndustrial strength ty lenol
可惜=哦可惜是我！

另一方面，A不应该被识别，因为我们无法获取单词pious teddy和pious的前缀并将它们连接到{{} 1}}。

检查应该很快（理想情况下是一些正则表达式），因为字符串B的集合可能很大。

Answer 1

您可以使用\bp(?:i|\w*(?>\h+\w+)*?\h+i)(?:t|\w*(?>\h+\w+)*?\h+t)(?:y|\w*(?>\h+\w+)*?\h+y)的模式来匹配这些字词。它假设空格用作单词分隔符。这很容易构造，只需要将单词的第一个字母匹配，然后循环其他字母并从中构造(?:[letter]|\w*(?>\h+\w+)*?\h+[letter])。

这种模式基本上是\bp(?:i|.*?\bi)(?:t|.*?\bt)(?:y|.*?\by)的展开版本，它对于倒数第二个字母或下一个字母的第一个字母（因为单词边界）而言是重要的。

您可以在此处看到它：https://regex101.com/r/r3ZVNE/2

我已将最后一个样本添加为非匹配的样本，用于我对原子组进行的一些测试。

在Delphi中我会这样做：

program ProjectTest;

uses
  System.SysUtils,
  RegularExpressions;

procedure CheckSpecialMatches(const Matchword: string; const CheckList: array of string);
var
  I: Integer;
  Pat, C: string;
  RegEx: TRegEx;
begin
  assert(Matchword.Length > 0);
  Pat := '\b' + Matchword[1];
  for I := Low(Matchword) + 1 to High(Matchword) do
    Pat := Pat + Format('(?:%0:s|\w*(?>\h+\w+)*?\h+%0:s)', [Matchword[I]]);
  RegEx := TRegEx.Create(Pat, [roCompiled, roIgnoreCase]);
  for C in CheckList do
    if Regex.IsMatch(C) then
      WriteLn(C);
end;

const
  CheckList: array[0..3] of string = (
    'pious teddy',
    'piston tank yard',
    'pesky industrial strength tylenol',
    'prison is ity');
  Matchword = 'pity';
begin
  CheckSpecialMatches(Matchword, CheckList);
  ReadLn;
end

字符串与多个前缀匹配

1 个答案: