Question

我是C语言编程的新手，我正在尝试编写一个简单的函数来规范化char数组。最后我想返回新char数组的长度。我来自java，所以如果我犯了一些看似简单的错误，我会道歉。我有以下代码：

/* The normalize procedure normalizes a character array of size len 
   according to the following rules:
     1) turn all upper case letters into lower case ones
     2) turn any white-space character into a space character and, 
        shrink any n>1 consecutive whitespace characters to exactly 1 whitespace

     When the procedure returns, the character array buf contains the newly 
     normalized string and the return value is the new length of the normalized string.

*/
int
normalize(unsigned char *buf,   /* The character array contains the string to be normalized*/
                    int len     /* the size of the original character array */)
{
    /* use a for loop to cycle through each character and the built in c functions to analyze it */
    int i;

if(isspace(buf[0])){
    buf[0] = "";
}
if(isspace(buf[len-1])){
    buf[len-1] = "";
}

    for(i = 0;i < len;i++){
        if(isupper(buf[i])) {
            buf[i]=tolower(buf[i]);
        }
        if(isspace(buf[i])) {
            buf[i]=" ";
        }
        if(isspace(buf[i]) && isspace(buf[i+1])){
            buf[i]="";
        }
    }

    return strlen(*buf);


}

如何在结束时返回char数组的长度？我的程序也正确地做了我想要的吗？

编辑：我根据评论对我的程序进行了一些更正。现在是正确的吗？

/* The normalize procedure normalizes a character array of size len 
   according to the following rules:
     1) turn all upper case letters into lower case ones
     2) turn any white-space character into a space character and, 
        shrink any n>1 consecutive whitespace characters to exactly 1 whitespace

     When the procedure returns, the character array buf contains the newly 
     normalized string and the return value is the new length of the normalized string.

*/
int
normalize(unsigned char *buf,   /* The character array contains the string to be normalized*/
                    int len     /* the size of the original character array */)
{
    /* use a for loop to cycle through each character and the built in c funstions to analyze it */
    int i = 0;
    int j = 0;

    if(isspace(buf[0])){
        //buf[0] = "";
        i++;
    }
    if(isspace(buf[len-1])){
        //buf[len-1] = "";
        i++;
    }
    for(i;i < len;i++){
        if(isupper(buf[i])) {
            buf[j]=tolower(buf[i]);
            j++;
        }
        if(isspace(buf[i])) {
            buf[j]=' ';
            j++;
        }
        if(isspace(buf[i]) && isspace(buf[i+1])){
            //buf[i]="";
            i++;
        }
    }

    return strlen(buf);


}

Answer 1

符号如下：

 buf[i]=" ";
 buf[i]="";

不要做你的想法/期望。您可能需要创建两个索引来逐步执行数组 - 一个用于当前读取位置，另一个用于当前写入位置，最初都为零。如果要删除字符，则不要增加写入位置。

^{警告：未经测试的代码。}

int i, j;
for (i = 0, j = 0; i < len; i++)
{
    if (isupper(buf[i]))
        buf[j++] = tolower(buf[i]);
    else if (isspace(buf[i])
    {
        buf[j++] = ' ';
        while (i+1 < len && isspace(buf[i+1]))
            i++;
    }
    else
        buf[j++] = buf[i];
}
buf[j] = '\0';  // Null terminate

使用以下方法用空格替换任意空格：

buf[i] = ' ';

你回来了：

return strlen(buf);

或者，使用上面的代码：

return j;

Answer 2

if(isspace(buf[i])) {
    buf[i]=" ";
}

这应该是buf[i] = ' '，而不是buf[i] = " "。您无法为字符分配字符串。

if(isspace(buf[i]) && isspace(buf[i+1])){
    buf[i]="";
}

这有两个问题。一个是你没有检查是否i < len - 1，所以buf[i + 1]可能不在字符串的末尾。另一个是buf[i] = ""根本不会做你想做的事。要从字符串中删除字符，您需要使用memmove将字符串的剩余内容移动到左侧。

return strlen(*buf);

这将是return strlen(buf)。 *buf是一个字符，而不是字符串。

Answer 3

代码中的几个错误：

您无法为buf[i]分配字符串，例如""或" "，因为buf[i]的类型为char且类型字符串是char*。
您正在阅读buf并使用索引buf写入i。这会产生一个问题，因为您想要消除连续的空白区域。因此，您应该使用一个索引进行阅读，使用另一个索引进行编写。
在C / C ++中，原生字符串是一个以0结尾的字符数组。所以从本质上讲，你可以简单地迭代buf直到你读到0（你）根本不需要使用len变量。此外，由于您要“截断”输入字符串，因此应将 new 最后一个字符设置为0.

以下是针对当前问题的一种可选解决方案：

int normalize(char* buf)
{
    char c;
    int i = 0;
    int j = 0;
    while (buf[i] != 0)
    {
        c = buf[i++];
        if (isspace(c))
        {
            j++;
            while (isspace(c))
                c = buf[i++];
        }
        if (isupper(c))
            buf[j] = tolower(c);
        j++;
    }
    buf[j] = 0;
    return j;
}

Answer 4

做这样的事情的规范方法是使用两个索引，一个用于阅读，一个用于写作。像这样：

int normalizeString(char* buf, int len) {
    int readPosition, writePosition;
    bool hadWhitespace = false;
    for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
        if(isspace(buf[readPosition]) {
            if(!hadWhitespace) buf[writePosition++] = ' ';
            hadWhitespace = true;
        } else if(...) {
            ...
        }
    }
    return writePosition;
}

警告：这仅根据给定的长度处理字符串。虽然使用缓冲区+长度具有能够处理任何数据的优点，但这不是C字符串工作的方式。 C字符串在其末尾由空字节终止，并且确保空字节位于正确位置是您的工作。你给的代码不处理空字节，我上面给出的缓冲区+长度版本也没有。这种规范化函数的正确C实现如下所示：

int normalizeString(char* string) {    //No length is passed, it is implicit in the null byte.
    char* in = string, *out = string;
    bool hadWhitespace = false;
    for(; *in; in++) {    //loop until the zero byte is encountered
        if(isspace(*in) {
            if(!hadWhitespace) *out++ = ' ';
            hadWhitespace = true;
        } else if(...) {
            ...
        }
    }
    *out = 0;    //add a new zero byte
    return out - string;    //use pointer arithmetic to retrieve the new length
}

在这段代码中，我用指针替换了索引，因为它很方便。这只是风格偏好的问题，我可以用显式索引编写相同的东西。（我的风格偏好不是针对指针迭代，而是针对简洁代码。）

Answer 5

你应该写：

return strlen(buf)

而不是：

return strlen(*buf)

原因：

buf的类型为char * - 它是内存中某个字符的地址（字符串开头的那个）。字符串以null结尾（或至少应该是），因此函数strlen知道何时停止计算字符。

*buf将取消引用指针，从而产生一个char - 而不是strlen所期望的。

Answer 6

与其他人差别不大，但假设这是unsigned char而不是C字符串的数组。

tolower()本身不需要isupper()测试。

int normalize(unsigned char *buf, int len) {
  int i = 0;
  int j = 0;
  int previous_is_space = 0;
  while (i < len) {
    if (isspace(buf[i])) {
      if (!previous_is_space) {
        buf[j++] = ' ';
      }
      previous_is_space = 1;
    } else {
      buf[j++] = tolower(buf[i]);
      previous_is_space = 0;
    }
    i++;
  }
  return j;
}

@OP：
根据发布的代码，它意味着前导和尾随空格应缩小为1 char 或消除所有前导和尾随空格。
上面的答案简单地将前导和尾随空格缩小为1 ' '。消除尾随和前导空格：

int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

在C中返回char数组的长度

6 个答案: