查找字符串是否混合的最有效方法

时间:2019-08-27 21:39:58

标签: c string algorithm optimization

假设我的字符串很长,我想看看一列是allLower,allUpper还是mixedCase。例如下面的列

text
"hello"
"New"
"items"
"iTem12"
"-3nXy"

文本为mixedCase。确定这一点的幼稚算法可能是:

int is_mixed_case, is_all_lower, is_all_upper;
int has_lower = 0;
int has_upper = 0;
// for each row...for each column...
for (int i = 0; (c=s[i]) != '\0'; i++) {
    if (c >='a' && c <= 'z') {
        has_lower = 1;
        if (has_upper) break;
    }
    else if (c >='A' && c <= 'Z') {
        has_upper = 1;
        if (has_lower) break;
    }
}

is_all_lower = has_lower && !has_upper;
is_all_upper = has_upper && !has_lower;
is_mixed_case = has_lower && has_upper;

但是,我敢肯定会有更高效的方法来做到这一点。进行此算法/计算的最有效方法是什么?

6 个答案:

答案 0 :(得分:4)

如果您知道将要使用的字符编码(在代码示例中使用了ISO/IEC 8859-15),则查找表可能是最快的解决方案。这也使您能够确定扩展字符集中的哪些字符(例如µ或ß),将算作大写,小写或非字母。

char test_case(const char *s) {
    static const char alphabet[] = {
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  //  ABCDEFGHIJKLMNO
        1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,  // PQRSTUVWXYZ
        0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,  //  abcdefghijklmno
        2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0,  // pqrstuvwxyz
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        0,0,0,0,0,0,0,1,0,2,0,2,0,0,0,0,  //        Š š ª
        0,0,0,0,0,1,2,0,0,2,0,2,0,1,2,1,  //      Žµ  ž º ŒœŸ
        1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  // ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
        1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,  // ÐÑÒÓÔÕÖ ØÙÚÛÜÝÞß
        2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,  // àáâãäåæçèéêëìíîï
        2,2,2,2,2,2,2,0,2,2,2,2,2,2,2,2}; // ðñòóôõö øùúûüýþÿ
    char cases = 0;
    while (*s && cases != 3) {
        cases |= alphabet[(unsigned char) *s++];
    }
    return cases; // 0 = none, 1 = upper, 2 = lower, 3 = mixed
}

根据 chux 的注释中的建议,您可以将alphabet[0]的值设置为4,然后在while循环中只需一个条件cases < 3。 / p>

答案 1 :(得分:2)

这应该相当有效-它检查所需的最少字符数。假设偏向于小写字符,因此先检查小写应更有效:

#include <ctype.h>

int ismixed( const unsigned char *str )
{
    int hasUpper = 0;
    int hasLower = 0;

    while ( *str )
    {
        // can't be both upper and lower case
        // but it can be neither
        if ( islower( *str ) )
        {
            hasLower = 1;
        }
        else if ( isupper( *str ) )
        {
            hasUpper = 1;
        }

        // return true as soon as we hit
        // both upper and lower case
        if ( hasLower && hasUpper )
        {
            return( 1 );
        }

        str++;
    }

    return( 0 );
}

根据您输入的是小写还是大写,最好先检查isupper()

答案 2 :(得分:2)

如果,我们假设使用ASCII

如果,我们假设所有字母都是

然后,代码只需要计算“ case”位。总和为0,与字符串长度相同吗?

void test_case(const char *s) {
  const char *start = s;
  size_t sum = 0;
  size_t mask = 'A' ^ 'a';
  while (*s) {
    sum += *s++ & mask;
  }
  ptrdiff_t len = s - start;
  sum /= mask;
  if (len == 0) puts("Empty string");
  else if (sum == 0) puts("All UC");   
  else if (sum == len) puts("All LC");
  else puts("Mixed");
}

注意:稍加修改,也将适用于EBCIDIC。

答案 3 :(得分:1)

是否保证字符串仅包含字母?如果是这样,可以检查连续两个字符是否不同。

#include <ctype.h>
#include <errno.h>
int mixed_case(const char *str) {
   if(!str){
      // sanity check
      errno = EINVAL;
      return -1;
   }

   // can't be mixed-case without more than one letter
   if(str[0] == '\0' || str[1] == '\0'){
      return 0;
   }

   for(int i = 1; str[i] != '\0' ; ++i) {
      if (!islower(str[i]) ^ !islower(str[i-1])) {
         // if two letter next to each other are not the same case, it's mixed case
         return 1;
      }
   }
   // didn't find any mismatches, so not mixed case
   return 0;
}

采用类似的方法,但它不会查找连续的字符,而是会找到第一个字母字符并将其与找到的任何其他字母字符进行比较。这应该能够处理具有非字母字符的字符串。

int mixed_case(const char *str) {
   if(!str){
      // sanity check
      errno = EINVAL;
      return -1;
   }

   // can't be mixed-case without more than one letter
   if(str[0] == '\0' || str[1] == '\0'){
      return 0;
   }

   // find the first alphabetical character and store its index at 'i'
   int i = 0;
   for(;!isalpha(str[i]) || str[i] == '\0'; ++i);

   if(str[i] == '\0') {
      // no alphabetical characters means you can't have mixed cases
      return 0;
   }

   // See if any of the other alphabetical characters differ from the case of the first one
   for(int j = i+1; str[j] != '\0' ; ++j) {
      if(isalpha(str[j]) && (!islower(str[i]) ^ !islower(str[j]))) {
         return 1;
      }
   }
   // didn't find any mismatches, so not mixed case
   return 0;
}

答案 4 :(得分:0)

另一种方法既不使用ASCII也不使用全字母。

评估第一个char,然后执行2个优化循环之一。

这将退出第一个不匹配的循环。由于while()循环仅进行一次测试,因此可以达到最佳性能。

#include <ctype.h>

void case_test(const char *s) {
  if (*s == '\0') {
    puts("Empty string");
    return;
  }

  unsigned char *us = (unsigned char *)s; // use unsigned char with is***() functions.
  if (islower(*us)) {
    while (islower(*us)) {
      us++;
    }
    if (*us) {
      puts("Mixed or not alpha");
    } else {
      puts("All lower");
    }
  } else if (isupper(*us)) {
    while (isupper(*us)) {
      us++;
    }
    if (*us) {
      puts("Mixed case or not alpha");
    } else {
      puts("All upper");
    }
  } else {
    puts("Not alpha");
  }
}

OP添加了包括非alpha的案例。下面迅速处理了这个问题。

void case_test_with_non_letters(const char *s) {
  unsigned char *us = (unsigned char *)s; // use unsigned char with is***() functions.

  //  Find first alpha or null character
  while (!isalpha(*us) && *us) {
    us++;
  }

  if (*us == '\0') {
    puts("Empty string");
    return;
  }

  if (islower(*us)) {
    while (!isupper(*us) && *us) {
      us++;
    }
    if (isupper(*us)) {
      puts("Mixed");
    } else {
      puts("All letters lower");
    }
  } else if (isupper(*us)) {
    while (!islower(*us) && *us) {
      us++;
    }
    if (*us) {
      puts("Mixed case");
    } else {
      puts("All letters upper");
    }
  } else {
    puts("Not alpha");
  }
}

答案 5 :(得分:-1)

97 = a = 1100001 65 = A = 0100001

您只需测试6位。