Question

我知道如何比较两个文本并获得两者中出现的所有单个单词。但是我怎样才能匹配表达/短语？

例如： “这是电脑制造商Apple” “苹果公司是加利福尼亚州的一家伟大的电脑制造商”

现在:)

Apple显然都存在。
计算机和制造商都在两者中。我可以在这一点检查它们是否是一组单词（一个跟随另一个单词）。

但是对于处理的速度，是不是有办法匹配“计算机制造商”而不是每一个，然后检查是否作为一个群体出现。

请记住，给出的例子是微不足道的，仅仅为了举例说明，实际上可能会出现更复杂的句子/文本。

Answer 1

编辑：听起来您可能正在寻找the longest common substring problem或更常见the longest common subsequence problem的解决方案。如果是这种情况，那么您基本上需要对“diff”实用程序进行修改，而实施的详细信息在很大程度上取决于您的要求的详细信息。

Answer 2

您可以解析两个字符串并在空格上拆分以获取令牌数组A1和A2。然后，只需检查A1中的每个连续子序列，以查找A2中匹配的子序列。这看起来像O（n ^ 4），这比获得所有单个匹配并寻找组合更好......这不是多项式。

  1. the cat is on the roof
  2. a man is on the stage

  A1 = [the, cat, is, on, the, roof]
  A2 = [a, man, is, on, the, stage]

  [the]: no match
  [cat]: no match
  [is]: match
  [is, on]: match
  [is, on, the]: match
  [is, on, the, roof]: no match
  [on]: match
  [on, the]: match
  [on, the, roof]: no match
  [the]: match
  [the, roof]: no match
  [roof]: no match
  -end-

递归似乎是一种优雅的方式来实现这样的东西。如果你需要更高效的东西，我相信有更聪明的方法可以做到这一点。

从两个文本中找到相似的单词或短语

2 个答案: