在c中实现字符串转换表

时间:2009-08-25 15:49:10

标签: c string search replace

我想在C中实现基本的搜索/替换转换表;也就是说,它将读取配置文件中的单词对列表,并查看在运行时接收的文本,将其找到的每个源单词替换为相应的目标单词。例如,如果我的用户输入文本是

"Hello world, how are you today?"

我的配置文件是

world user
how why

运行该函数将返回

"Hello user, why are you today?"

我可以用适量的乏味(目前正在查看glib string utility functions,因为它们在那里)这样做,但我认为这必须是某个库或其他库中的完全解决的问题。有什么指针吗?

(不,这不是功课,虽然我会承认这个问题听起来很合理:)我正在写一个libpurple插件,因此纯C要求。)

4 个答案:

答案 0 :(得分:5)

我也很惊讶发现非常简单的字符串操作方法有多么困难。 我想要的是与程序语言相当的面向对象的string.replace()方法。从我所知道的,这也是你问题的本质......通过这种方法你可以添加额外的代码来逐行读取文件并在空格上标记它。

使实现这样一个方法变得棘手的原因是它确实是一个应用程序决定,指定分配缓冲区以将字符串的转换版本放入的最佳方法。你有几个选择: 1)让用户将缓冲区传递给应用程序并将其留给用户,以确保缓冲区总是足够大,以便转换后的版本。 2)在方法内部执行一些动态内存分配,并强制调用者在返回的指针上调用free()。​​

我选择了#1,因为动态内存分配的开销对于嵌入式应用来说太大了。此外,它要求用户稍后调用free(),这很容易忘记。

结果函数变得非常难看。我做了一个非常快速的实现,我将其包含在下面。在用于生产之前,应该进一步测试该方法。在使用之前,我最终将项目带向了另一个方向。

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <assert.h>

/*
 * searches an input string for occurrence of a particular string and replaces it with another.  The resulting string is
 * stored in a buffer which is passed in to the function. 
 * 
 * @param pDest is a buffer which the updated version of the string will be placed into.  THIS MUST BE PREALLOCATED.  It's 
          the callers responsibility to make sure that pDest is of sufficient size that the buffer will not be overflowed.
 * @param pDestLen is the number of chars in pDest
 * @param pSrc is a constant string which is the original string
 * @param pSearch is the string to search for in pSrc.
 * @param pReplacement is the string that pSearch will be replaced with.
 * @return if successful it returns the number of times pSearch was replaced in the string.  Otherwise it returns a negative number
 *         to indicate an error.  It returns -1 if one of the strings passed in == NULL, -2 if the destination buffer is of insufficient size.  
 *         Note: the value stored in pDest is undefined if an error occurs.  
 */
int string_findAndReplace( char* pDest, int pDestLen, const char* pSrc, const char* pSearch, const char* pReplacement) {
    int destIndex=0;
    char* next;
    const char* prev = pSrc;
    int copyLen=0;
    int foundCnt = 0;

    if( pDest == NULL || pDestLen == 0 || pSrc == NULL || pSrc == NULL || pReplacement == NULL ) {
        return -1;
    }

    // TODO: BEFORE EACH MEMCPY, IT SHOULD BE VERIFIED THAT IT WILL NOT COPY OUT OF THE BOUNDS OF THE BUFFER SPACE
    //       THIS IS A VERY BASIC CHECK 
    if( pDestLen < strlen(pSrc) ) {
        return -2;
    }


    memset(pDest, 0x00, pDestLen);

    //printf("Entered findAndReplace\r\n");

    do {    
        next = strstr( prev, pSearch );

        if( next != NULL ) {        
            //printf("  next -> %s\r\n", next);

            copyLen = (next-prev);

            // copy chars before the search string
            memcpy( &pDest[destIndex], prev, copyLen ); 
            destIndex += copyLen;

            // insert the replacement               
            memcpy( &pDest[destIndex], pReplacement, strlen(pReplacement) );
            destIndex += strlen(pReplacement);              

            prev = next;
            prev += strlen(pSearch);
            foundCnt++;         
        }
    }while( next != NULL );

    //copy what's left from prev to the end to the end of dest.
    copyLen = strlen(prev);
    memcpy( &pDest[destIndex], prev, copyLen+1); // +1 makes it null terminate.

    //printf("prev='%s'\r\ndest='%s'\r\n", prev, pDest);
    return foundCnt;
}


// --------- VERY BASIC TEST HARNESS FOR THE METHOD ABOVE --------------- // 

#define NUM_TESTS 8

// Very rudimentary test harness for the string_findAndReplace method.
int main(int argsc, char** argsv) {

int i=0;
char newString[1000];

char input[][1000] = { 
"Emergency condition has been resolved. The all clear has been issued.",
"Emergency condition has been resolved and the all clear has been issued.",
"lions, tigers, and bears",
"and something, and another thing and",
"too many commas,, and, also androids",
" and and and,, and and ",
"Avoid doors, windows and large open rooms.",
"Avoid doors and windows."

};

char output[][1000] = { 
"Emergency condition has been resolved. The all clear has been issued.",
"Emergency condition has been resolved, and the all clear has been issued.",
"lions, tigers,, and bears",
"and something,, and another thing and",
"too many commas,, and, also androids",
", and, and, and,,, and, and, ",
"Avoid doors, windows, and large open rooms.",
"Avoid doors, and windows."
};

    char searchFor[] = " and ";
    char replaceWith[] = ", and ";

    printf("String replacer\r\n");

    for( i=0; i< NUM_TESTS; i++ ) {

        string_findAndReplace( newString, sizeof( newString ), input[i], searchFor, replaceWith );

        if( strcmp( newString, output[i] ) == 0 ) {
            printf("SUCCESS\r\n\r\n");
        }
        else {
            printf("FAILED: \r\n IN :'%s'\r\n OUT:'%s'\r\n EXP:'%s'\r\n\r\n", input[i],newString,output[i]);
        }

    }

    printf("\r\nDONE.\r\n");
    return 0;
}

答案 1 :(得分:1)

如果你没有配置文件要求,你可以得到(f)lex来为你生成C代码。但这意味着每次单词对列表发生变化时都会重新编译。

也许这有点过分,但您可以将每个单词存储在链表的节点中。这使得通过改变和替换单词来构造新句子非常容易。

答案 2 :(得分:0)

您可以查看GNU gettext。 (另见Wikipedia article。)

答案 3 :(得分:0)