Question

我制作的程序读取由字符串组成的文本文件，每个文件都在一行上。基本上我这样做：

...
char* name;
char* buffer = malloc(sizeof(char) * SIZE); //size is a defined constant in the header
while(fgets(buffer, SIZE, pf)){ //pf is the opened stream
    name = malloc(sizeof(char) * SIZE);
    strcpy(name, strtok(buffer, "\n"));
    manipulate(name); //call an extern function
}

函数操纵以这种方式声明：

void manipulate(void* ptr);

问题在于，通过这种方式，两个相等的字符串将具有不同的内存地址，因此它们将被识别为操作函数中的两个不同元素。

如何将它们识别为单个元素？

Answer 1

将字符串存储在set中，这是一种不存储重复值且搜索速度快的数据类型。基本上它是一个哈希表，其中键是字符串，值并不重要。

你可以编写自己的哈希表，这是一个很好的练习，但是对于制作你最好使用像GLib那样的现有哈希表。它已经具有使用哈希表作为集合的便利方法。在我们处理此问题时，我们可以使用他们的g_strchomp()和g_strdup()。

#include <stdio.h>
#include <glib.h>

int main () {
    // Initialize our set of strings.
    GHashTable *set = g_hash_table_new(g_str_hash, g_str_equal);

    // Allocate a line buffer on the stack.
    char line[1024];

    // Read lines from stdin.
    while(fgets(line, sizeof(line), stdin)) {
        // Strip the newline.
        g_strchomp(line);

        // Look up the string in the set.
        char *string = g_hash_table_lookup(set, line);
        if( string == NULL ) {
            // Haven't seen this string before.
            // Copy it, using only the memory we need.
            string = g_strdup(line);
            // Add it to the set.
            g_hash_table_add(set, string);
        }

        printf("%p - %s\n", string, string);
    }
}

这是一个快速演示。

$ ./test
foo
0x60200000bd90 - foo
foo
0x60200000bd90 - foo
bar
0x60200000bd70 - bar
baz
0x60200000bd50 - baz
aldskflkajd
0x60200000bd30 - aldskflkajd
aldskflkajd
0x60200000bd30 - aldskflkajd

Answer 2

如果您确实有两个字符串，那么无论其内容是否相同，它们都必须具有不同的地址。听起来你想要跟踪你已经读过的字符串，以避免/合并重复项。首先是＆＃34;保持跟踪＆＃34;一部分。

显然，您需要某种数据结构来记录您已经读过的字符串。你有很多选择，它们有不同的优点和缺点。如果您需要处理的不同字符串的数量相对较小，那么简单的数组或链接列表就足够了，但如果它足够大，那么哈希表将提供更好的性能。

有了这个，你可以检查每个新读取的字符串与先前读取的字符串，并采取相应的行动。

将相等的字符串吸收到同一个指针上

2 个答案: