规范化char数组并删除多余的字符(截断)

时间:2014-02-18 23:16:27

标签: c arrays normalization truncate

我有以下用于规范化char数组的代码。在该过程结束时,规范化文件最后会留下一些旧的输出。这样做i到达j之前的数组末尾。这是有道理的,但如何删除多余的字符?我来自java,所以如果我犯了一些看似简单的错误,我会道歉。我有以下代码:

/* The normalize procedure normalizes a character array of size len 
   according to the following rules:
     1) turn all upper case letters into lower case ones
     2) turn any white-space character into a space character and, 
        shrink any n>1 consecutive whitespace characters to exactly 1 whitespace

     When the procedure returns, the character array buf contains the newly 
     normalized string and the return value is the new length of the normalized string.

     hint: you may want to use C library function isupper, isspace, tolower
     do "man isupper"
*/
int
normalize(unsigned char *buf,   /* The character array contains the string to be normalized*/
                    int len     /* the size of the original character array */)
{
    /* use a for loop to cycle through each character and the built in c funstions to analyze it */
    int i = 0;
    int j = 0;
    int k = len;

    if(isspace(buf[0])){
        i++;
        k--;
    }
    if(isspace(buf[len-1])){
        i++;
        k--;
    }
    for(i;i < len;i++){
        if(islower(buf[i])) {
            buf[j]=buf[i];
            j++;
        }
        if(isupper(buf[i])) {
            buf[j]=tolower(buf[i]);
            j++;
        }
        if(isspace(buf[i]) && !isspace(buf[j-1])) {
            buf[j]=' ';
            j++;
        }
        if(isspace(buf[i]) && isspace(buf[i+1])){
            i++;
            k--;
        }
    }

   return k;

}

以下是一些示例输出:

  

halb mwqcnfuokuqhuhy ja mdqu nzskzkdkywqsfbs zwb lyvli HALB   MwQcnfuOKuQhuhy Ja mDQU nZSkZkDkYWqsfBS ZWb lyVLi

正如您所看到的那样,结束部分正在重复。新的标准化数据和旧的未标准化数据都存在于结果中。我该如何解决这个问题?

3 个答案:

答案 0 :(得分:2)

添加空终结符

k[newLength]='\0';
return k;

答案 1 :(得分:0)

像这样修复

int normalize(unsigned char *buf, int len) {
    int i, j;

    for(j=i=0 ;i < len; ++i){
        if(isupper(buf[i])) {
            buf[j++]=tolower(buf[i]);
            continue ;
        }
        if(isspace(buf[i])){
            if(!j || j && buf[j-1] != ' ')
                buf[j++]=' ';
            continue ;
        }
        buf[j++] = buf[i];
    }
    buf[j] = '\0';

    return j;
}

答案 2 :(得分:0)

或?添加一个空终止符

    k[newLength] = NULL;
    return k;