C ++中两个字符串之间匹配的字符数

时间:2014-09-21 16:55:52

标签: c++ string spell-checking

我正在建立一个拼写纠正的小项目,这不是功课。

给出两个字符串str1和str2。一个人必须找出两个字符串之间匹配的字符数。

例如,如果str1 ="分配"和str2 =" assingn",然后输出应为6.

在str2中,字符," a"," s"," s"," i"," g&# 34;," n"在str1中,"分配"。因此输出应为6。

如果str1 =" sisdirturn"和str2 ="打扰",然后输出应为6.

在str2中,字符," d"," i"," s"," t"," u& #34;," r"是字符串str1," sisdirturn"。因此输出应为6。

我尝试了很多尝试,但是我无法得到答案。请帮助解决这个问题,如果有任何改进的想法,请告诉我们。

到目前为止,这是我的尝试:

int char_match (string str1, string str2)
{
    //Take two strings, split them into vector of characters and sort them.
    int i, j, value = 0;
    vector <char> size1, size2;
    char* cstr1 = new char[str1.length() + 1];
    strcpy(cstr1, str1.c_str());
    char* cstr2 = new char[str2.length() + 1];
    strcpy(cstr2, str2.c_str());

    for(i = 0, j = 0 ; i < strlen(cstr1), j < strlen(cstr2); i++, j++)
    {
        size1.push_back( cstr1[i] );
        size2.push_back( cstr2[j] );
    }

    sort (size1.begin(), size1.end() );
    sort (size2.begin(), size2.end() );

    //Start from beginning of two vectors. If characters are matched, pop them and reset the counters.
    i = 0;
    j = 0;

    while ( !size1.empty() )
    {
        out :
        while ( !size2.empty() )
        {

            if (size1[i] == size2[j])
            {
                value++;
                pop_front(size1);
                pop_front(size2);
                i = 0;
                j = 0;
                goto out;
            }
            j++;    
        }
        i++;
    }

    return value;
}

2 个答案:

答案 0 :(得分:3)

#include <iostream>
#include <algorithm> // sort, set_intersection

std::string::size_type matching_characters(std::string s1, std::string s2) {
  sort(begin(s1), end(s1));
  sort(begin(s2), end(s2));
  std::string intersection;
  std::set_intersection(begin(s1), end(s1), begin(s2), end(s2),
                        back_inserter(intersection));
  return intersection.size();
}

int main() {
  std::cout << matching_characters("assign", "assingn") << '\n';     // 6
  std::cout << matching_characters("sisdirturn", "disturb") << '\n'; // 6
}

以上使用sort,因此它具有O(N * log N)性能,如果这很重要的话。如果您的所有输入都很小,那么这可能比第二个解决方案更快:

Sora的解决方案具有更好的复杂性,也可以使用标准<algorithm>简洁地实现:

#include <iostream>
#include <algorithm> // for_each
#include <numeric>   // inner_product

int matching_characters(std::string const &s1, std::string const &s2) {
  int s1_char_frequencies[256] = {};
  int s2_char_frequencies[256] = {};
  for_each(begin(s1), end(s1),
           [&](unsigned char c) { ++s1_char_frequencies[c]; });
  for_each(begin(s2), end(s2),
           [&](unsigned char c) { ++s2_char_frequencies[c]; });

  return std::inner_product(std::begin(s1_char_frequencies),
                            std::end(s1_char_frequencies),
                            std::begin(s2_char_frequencies), 0, std::plus<>(),
                            [](auto l, auto r) { return std::min(l, r); });
}

int main() {
  std::cout << matching_characters("assign", "assingn") << '\n';     // 6
  std::cout << matching_characters("sisdirturn", "disturb") << '\n'; // 6
}

为了方便起见,我使用了C ++ 14功能,例如通用lambdas。如果您的编译器不支持C ++ 14,则可能需要进行一些修改。


对我来说,使用sortset_intersection的解决方案大约需要1/4的时间作为这些输入的另一个解决方案。这是因为对6或7个元素的数组进行排序和迭代可能比必须遍历256个元素的数组更快。

sort/set_intersection (3667ns)for_each/inner_product (16,363ns)

一旦输入足够大,速度优势将以另一种方式倾斜。此外,在输入太大而无法利用小字符串优化的情况下,sort / set_intersection方法将开始进行昂贵的内存分配。

当然,这个性能结果是高度依赖于实现的,所以如果这个例程的性能很重要,那么你必须自己在实际输入的目标实现上测试它。如果不重要,那么O(N)解决方案是更好的选择。

答案 1 :(得分:1)

我不是百分之百就是你实际想要达到的目标,但是在尝试查看单词中匹配多少个字符的情况下,只是通过它们运行循环的简单情况每次找到匹配项时添加1,如此

int char_match (string str1, string str2)
{
    //Take two strings, split them into vector of characters and sort them.
   unsigned int matches = 0;

   unsigned int stringLength = (str1.length > str2.length) ? str2.length : str1.length;

   for(unsigned int i = 0; i < stringLength; ++i)
   {
       if(str1[i] == str2[i])
       {
           ++matches;
       }
   }

    return matches;
}

但是从你的代码中看起来你想要确切地知道他们有多少相同的字符就是说忽略每个字符的实际位置然后它将是一个相当不同的过程。有点像这个

int char_match (string str1, string str2)
{
    unsigned int str1CharCount[256] = {0};
    unsigned int str2CharCount[256] = {0};

    unsigned int matches = 0;

   for(unsigned int i = 0; i < str1.length; ++i)
   {
       ++str1CharCount[static_cast<unsigned short>(str1[i])];
   }

   for(unsigned int i = 0; i < str2.length; ++i)
   {
       ++str2CharCount[static_cast<unsigned short>(str1[i])];
   }

   for(unsigned int i = 0; i < 256; ++i)
   {
       matches += (str1CharCount[i] > str1CharCount[i]) ? str1CharCount[i] - (str1CharCount[i] - str2CharCount[i]) : str2CharCount[i] - (str2CharCount[i] - str1CharCount[i]);
   }

    return matches;
}

请注意,对于第二个功能,可能有更多有效的方法,但它应该都可以工作

编辑:

此代码应该执行您想要的操作,主要区别在于它检查ascii值以确保它是有效字符

int char_match (string str1, string str2)
{
    unsigned int str1CharCount[256] = {0};
    unsigned int str2CharCount[256] = {0};

    unsigned int matches = 0;

    for(unsigned int i = 0; i < str1.length; ++i)
    {
        unsigned short aValue = static_cast<unsigned short>(str1[i]);
        if(aValue >= static_cast<unsigned short>('a') && aValue <= static_cast<unsigned short>('z'))
        {
            ++str1CharCount[static_cast<unsigned short>(str1[i]) - 32];
        }
        else if(aValue >= static_cast<unsigned short>('A') && aValue <= static_cast<unsigned short>('Z'))
        {
            ++str1CharCount[static_cast<unsigned short>(str1[i])];
        }
    }

    for(unsigned int i = 0; i < str2.length; ++i)
    {
        ++str2CharCount[static_cast<unsigned short>(str1[i])];
    }

    for(unsigned int i = static_cast<unsigned short>('a'); i <= static_cast<unsigned short>('Z'); ++i)
    {
        matches += (str1CharCount[i] > str1CharCount[i]) ? str1CharCount[i] - (str1CharCount[i] - str2CharCount[i]) : str2CharCount[i] - (str2CharCount[i] - str1CharCount[i]);
    }

    return matches;
}