Question

我有一个字符串比较功能代码如下：

#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>

int max=0;

int calcMis(char *string,int i, int j,int len)
{
     int mis=0;
     int k=0;
     while(k<len)
     {
             if(string[i+k]!=string[j+k])
                mis+=1;
             if((mis+len-k-1)<=max)
                 return 1;
             else if(mis>max)
                 return 0;
             k=k+1;
     }
}

int main()
{
    char *input=malloc(2000*sizeof(char));
    scanf("%d",&max);
    scanf("%s",input);
    int c=0,i,j,k,x,mis=0;
    int len=strlen(input);
    i=0;
    while(i<len-1)
    {
        j=i;
        while(j<len-1)
         {
             k=i+1;
             x=j-i+1;
             if(x<=max)
                 c=c+len-k-x+1;
             else
                while(k+x<=len)
                {
                  if(strncmp(input+i,input+k,x+1)==0)
                   {
                      if(max>=0)
                          c=c+x;
                   }
                  else
                   c+=calcMis(input,i,k,x);
                  k=k+1;
                }       
            j=j+1;
         }
        i=i+1;
    }   
    printf("%d",c);
    return 0;   
}

此代码是问题的解决方案：

给定字符串S和整数K，找到等于的整数C. S1和S2具有的子串对（S1，S2）的数量等长和不匹配（S1，S2）＆lt; = K，其中不匹配函数是定义如下。

例如：abc然后子串是{a，ab，abc，b，bc，c}

有没有比这更好的方法。这段代码中是否有可能的优化？

Answer 1

注意：此分析是在编辑帖子之前进行的，并包含了他/她的其余代码。他/她没有提到原帖子中的main函数（我在其中提供了答案）。

查看calcMis的代码，这里有一些可读性和样式改进：

从循环中删除所有返回语句。对于小循环而言，这不是什么大问题，但在较大的循环中会有更大的优势，因为当它有3或4个额外的情况离开循环时，它更难调试。

根据函数的功能重新定义参数。

您的算法按n的顺序运行，但我们可以减少它执行的一些操作。我对你的算法的分析如下：

assignment operator (=) x4: O(1) while loop x1: O(n), where n is len. dereference operator (*) x2: O(1) less than operator (<) x1: O(1) does not equal operator (!=) x1: O(1) addition operator (+) x4: O(1) subtraction operator (-) X2: O(1) less than or equal to operator (<=) x1: O(1) Order: O(n) + 2 * O(1) + O(1) + O(1) + 4 * O(1) + 2 * O(1) + 1 * O(1) = O(n) Order: 4 * O(1) + O(n) = O(n)

以下是改进的算法（微效率和可读性改进） - 仍然是线性顺序但指令较少，并利用编译器的const优化：

bool calcMis( char const * const str, int const i, int const j, int const len ) { // Checks pre conditions. assert( str != NULL ); // Determines if the length is 0, if so return 0 mismatches. if ( len == 0 ) return true; // Determines if we are comparing at the same index, if so return 0 mismatches. if ( i == j ) return true; // Defines an integer mis, holds the number of mismatches. int mis = 0; // Iterates over the entire string of length len. for ( int k = 0; ( k < len ) && ( mis < max ); k++ ) { // Determines whether there was a mismatch at positions i and j. if ( str[ i + k ] != str[ j + k ] ) mis += 1; } // Defines a bool result, determines whether we have had too many mismatches. bool const result = !( mis > max ); return result; }

Answer 2

这是一个可能会有所帮助的想法。首先，比较字符串中的所有字符对：

void compare_all(char* string, int length, int* comp)
{
    for (int i1 = 0; i1 < length; ++i1)
        for (int i2 = 0; i2 < length; ++i2)
            result[i1 * length + i2] = (string[i1] != string[i2]);
}

此处comp表示包含值0和1的方阵。每对子串对应于此矩阵中的对角线部分。例如，对于字符串“testing”，矩阵的以下部分表示子字符串“tes”和“tin”。

. . . O . . .
. . . . O . .
. . . . . O .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

您必须计算有多少个部分的元素总和不超过k。为此，请逐个检查与主对角线平行的所有对角线。为了不计算两次，只查看主对角线下方（或上方）的那些（为简单起见，让我们包括主对角线）。

int count_stuff(int* comp, int n, int k)
{
    int result = 0;
    for (diag = 0; diag < n; ++diag)
    {
        int* first_element_in_diagonal = comp + diag;
        int jump_to_next_element = n + 1;
        int length_of_diagonal = n - diag;
        result += count_stuff_on_diagonal(
            first_element_in_diagonal,
            jump_to_next_element,
            length_of_diagonal,
            k);
    }
    return result;
}

现在，问题更为简单：找到整数序列中的节数，其总和不大于k。最简单的方法是枚举所有这些部分。

int count_stuff_on_diagonal(int* comp, int jump, int n, int k)
{
    int result = 0;
    for (int i1 = 0; i1 < n; ++i1)
        for (int i2 = i1 + 1; i2 < n; ++i2)
        {
            int* first_element_in_section = comp + i1 * jump;
            int mismatches = count_sum_of_section(
                first_element_in_section,
                jump,
                i2 - i1);
            if (mismatches <= k)
                ++result;
        }
    return result;
}

为了提高计算连续整数部分之和的速度，建立一个cumulative sums表;用它代替0和1的矩阵。

（请原谅我没有使用const和VLA，偶尔会出现语法错误。）

字符串比较的代码优化

2 个答案: