Question

我有一个问题，我想尽可能有效地解决。举个例子，我收到了一串字：A B C D我有一本字典＆＃39;有5个条目：

A
B C
B D
D
E

字典告诉我哪些子字符串可以在我的输入字符串中。我想尽可能有效地检查整个输入字符串是否可以拆分为子字符串，以便在字典中找到所有字符串。

在示例中，可以通过将输入字符串拆分为A，B C和D

来找到输入字符串

我想知道是否有更好的方法，而不仅仅是强制执行所有可能的子串。我检查子字符串是否在字典中的次数越少越好。

如果没有可能的解决方案，没有必要知道哪些子串不能找到。

谢谢。

Answer 1

我会使用树而不是字典。这将提高搜索速度，并将消除用于搜索的子树。

Answer 2

如果您可以多次使用相同的子字符串，则可以使用自然的动态编程解决方案。

设n为字符串的大小。设v是一个大小为n的向量，使得v [i] = true，当且仅当原始字符串的（n-i）最后一个字符组成的子字符串可以用你的字典分解时。然后你可以向后填充向量v，从每个步骤减去i的最后一个索引开始。

在伪代码中：

Let D be your dictionnary
Let s be your string
Let n be the size of s
(Let s[i:j] denote the substring of s made by characters between i and j (inclusive))
Let v be a vector of size n+1 filled with zeros
Let v[n] = 1
For int i = n-1; i>=0; i--
    For int j = i; j <=n-1; j++
        If (s[i:j] is in D) and (v[j+1] is equal to 1)
            v[i] = 1
            Exit the for loop
Return v[0]

Answer 3

您可以通过以下方法在O(N^2)中运行它。

首先将所有字符串存储在trie中。

其次，使用动态编程方法来解决您的问题。对于每个位置i，我们将计算是否可以将第一个i符号的子字符串拆分为字典（trie）中的单词。为简单起见，我们将使用前瞻性的动态编程方法。

首先，我们设置可以拆分第一个0符号的子字符串。然后我们从0迭代到N-1。当我们来到i时，我们假设我们已经知道了这个位置的答案。如果分割是可能的，那么我们可以从这个位置开始，看看从这个位置开始的哪些字符串在trie中。对于每个这样的字符串，尽可能标记其结束位置。通过使用trie，我们可以在O(N)每个外部循环迭代中执行此操作。

t = trie of given words
ans = {false}
ans[0] = true
for i=0..N-1
    if ans[i]   // if substring s[0]..s[i-1] can be split to given words
       cur = t.root
       for j=i to N-1    // go along all strings starting from position i
           cur=cur.child(s[j])   // move to the child in trie
            // therefore, the position cur corresponds to string
            // s[i]...s[j]
           if cur.isWordEnd    // if there is a word in trie that ends in cur
               ans[j+1] = true  // the string s[0]..s[j] can be split
your answer is in ans[N]

总时间为O(N^2)。

如何检查是否可以在字典中找到所有子字符串

3 个答案: