Question

假设我有一个字符串“abcdpqrs”，现在，“dcb”可以算作上面字符串的子字符串，因为字符在一起。 “pdq”也是上述字符串的一部分。但“bcpq”不是。我希望你得到我想要的东西。有没有有效的方法来做到这一点。我能想到的就是利用哈希的帮助来做到这一点。但是即使在O（n）程序中也需要很长时间，因为在许多情况下需要回溯。任何帮助将不胜感激。

Answer 1

这是一个O（n *字母大小）解决方案：

让我们维持一个数组计数[a] =字符a在当前窗口中的次数[pos; pos + lenght of substring - 1]。它可以在O（1）时间内重新计算，当窗口向右移动1（count [s [pos]] - ，count [s [pos + substring lenght]] ++，pos ++）。现在我们需要的是检查每个pos，count数组与子串的count数组相同（它只能计算一次）。

它实际上可以改进为O（n +字母大小）：

我们可以保持数字diff =不具有与当前窗口的子字符串相同的计数值的字符数，而不是以天真的方式比较计数数组。关键的观察是diff以明显的方式改变我们应用count [c] - 或count [c] ++（它要么递增，递减要么保持不变，这取决于count [c]值）。当且仅当diff为零时，两个计数数组才相同。

Answer 2

假设您有字符串“axcdlef”并想要搜索“opde”：

bool compare (string s1, string s2)
{
  // sort both here
  // return if they are equal when sorted;
}

您需要为此示例调用此函数，并使用以下大小为4的子字符串（与“opde”的长度相同）：

“axcd” “xcdl” “cdle” “dlef”

  bool exist = false;

  for (/*every split that has the same size as the search */)
      exist = exist || compare(currentsplit, search);

Answer 3

你可以使用正则表达式（即boost或Qt）。或者你使用这种简单的方法。您知道要在字符串s中搜索的字符串str的长度k。因此，请从str中获取每k个连续字符，并检查s中是否存在这些字符。

起点（进行进一步优化的天真实现）：

#include <iostream>

/* pos position where to extract probable string from str
*  s string set with possible repetitions being searched in str
*  str original string
*/
bool find_in_string( int pos, std::string s, std::string str)
{
    std::string str_s = str.substr( pos, s.length());
    int s_pos = 0;

    while( !s.empty())
    {
        std::size_t found = str_s.find( s[0]);
        if ( found!=std::string::npos)
        {
            s.erase( 0, 1);
            str_s.erase( found, 1);
        } else return 0;
    }

    return 1;
}

bool find_in_string( std::string s, std::string str)
{
    bool found = false;
    int pos = 0;    
    while( !found && pos < str.length() - s.length() + 1)
    {
        found = find_in_string( pos++, s, str);
    }

    return found;
}

用法：

int main() {

    std::string s1 = "abcdpqrs";
    std::string s2 = "adcbpqrs";
    std::string searched = "dcb";
    std::string searched2 = "pdq";
    std::string searched3 = "bcpq";
    std::cout << find_in_string( searched, s1);
    std::cout << find_in_string( searched, s2);
    std::cout << find_in_string( searched2, s1);
    std::cout << find_in_string( searched3, s1);

    return 0;
}

打印：1110

http://ideone.com/WrSMeV

Answer 4

要为此使用数组，您将需要一些额外的代码来映射每个角色所在的位置...除非您知道您只使用'a' - 'z'或类似的东西，否则您可以简单地减去从'a'获得职位。

bool compare(string s1, string s2)
{
   int v1[SIZE_OF_ALFABECT];
   int v2[SIZE_OF_ALFABECT];
   int count = 0;
   map<char, int> mymap;

  // here is just pseudocode
   foreach letter in s1:
      if map doesnt contain this letter already:
           mymap[letter] = count++;

 // repeat the same foreach in s2

 /* You can break and return false here if you try to add new char into map, 
  that means that the second string has a different character already... */

 // count will now have the number of distinct chars that you have in both strs

 // you will need to check only 'count' positions in the vectors

 for(int i = 0; i < count; i++)
    v1[i] = v2[i] = 0;

 //another pseudocode
   foreach letter in s1:
      v1[mymap[leter]]++;
   foreach letter in s1:
      v2[mymap[leter]]++;

  for(int i = 0; i < count; i++)
      if(v1[i] != v2[i])
          return false;

  return true;
}

Answer 5

这是一个O（m）最佳案例，O（m！）最坏情况解决方案 - m是搜索字符串的长度：

使用后缀trie，例如一个Ukkonnen Trie（有一些浮动，但我目前没有链接），并搜索子串的任何排列。请注意，无论n的大小如何，对于要搜索的字符串的每个字符，任何查找都只需要O（1）。

然而，虽然n的大小无关紧要，但对于大m来说这变得不实用。

如果n足够小，并且人们愿意牺牲索引大小的查找性能，则后缀trie可以存储包含原始字符串的所有排列的字符串。

然后查找将始终为O（m）。

我建议接受一般情况的接受答案。但是，在这里你有一个建议，可以（很多）更好地执行小子串和大字符串。

使用C / C ++中子字符串的任意字符顺序查找字符串内的子字符串

5 个答案: