我有一个数据文件:
C0001|H|Espresso Classics|The traditional espresso favourites.
C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.
C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.
C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.
C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.
C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.
C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.
以及以下代码来标记它:
#define MAX_CAT_TOK 4
#define DATA_DELIM "|"
char *token[100];
for(i = 0; i < MAX_CAT_TOK; i++)
{
if(i == 0) token[i] = strtok(input, DATA_DELIM);
else token[i] = strtok(NULL, DATA_DELIM);
printf("%s\n", token[i]);
}
问题是,一旦打印了一个跟在较长字符串后面的字符串,就会在较短字符串的末尾打印出较长字符串的数据。我假设这与字符串终止有关??
有人看到我在这里做错了吗?
答案 0 :(得分:3)
听起来发生的事情是你的缓冲区input
没有被正确终止。如果,或许它最初都是零,那么处理的第一行就没问题了。如果更长的输入存储在其中,那么它仍然可以。但是当一个条目存储在其中的时间比前一个条目短时(例如,示例中的第4行),如果它不是以空终止,则可能导致问题。
例如,如果新数据是通过memcpy
复制的,并且未包含空终止字符,那么该行中第4项的标记化将包括之前的数据。
如果是这种情况,那么解决方案是确保input
空值正确终止。
以下尝试展示我想说的内容:
strcpy( input, "a|b|c|some long data" );
tokenize( input ); // where tokenize is the logic shown in the OP calling strtok
// note the use of memcpy here rather than strcpy to show the idea
// and also note that it copies exactly 11 characters (doesn't include the null)
memcpy( input, "1|2|3|short", 11 );
tokenize( input );
在上面的设计示例中,第二个标记化中的第4个项目是:shortlong data
。
修改强> 换句话说,问题似乎不在OP中显示的代码中。问题在于如何填充输入。如果在for循环之前添加printf来显示正在解析的实际数据,则可能会发现它没有正确地终止null。第4行可能会显示它包含前一行的残余:
printf( "%s\n", input );
答案 1 :(得分:2)
我看不出任何错误。
我冒昧地编写了一个可编译的代码版本并将其放在ideone。与您的版本比较......
#include <stdio.h>
#include <string.h>
int main(void) {
int i, j;
char *token[100];
char *input;
char inputs[7][300] = {
"C0001|H|Espresso Classics|The traditional espresso favourites.",
"C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.",
"C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.",
"C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.",
"C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.",
"C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.",
"C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.",
};
for (j = 0; j < 7; j++) {
input = inputs[j];
for (i = 0; i < 4; i++) {
if (i == 0) {
token[i] = strtok(input, "|");
} else {
token[i] = strtok(NULL, "|");
}
printf("%s\n", token[i]);
}
}
return 0;
}
答案 2 :(得分:0)
这是我的工作代码:
#include <string.h>
#include <stdio.h>
//#define DATA_DELIM "|"
#define DATA_DELIM "|\n"
int main(void)
{
enum { LINE_LENGTH = 4096 };
char input[LINE_LENGTH];
#define MAX_CAT_TOK 4
char *token[100];
while (fgets(input, sizeof(input), stdin) != 0)
{
printf("Input: %s", input);
for (int i = 0; i < MAX_CAT_TOK; i++)
{
if (i == 0)
token[i] = strtok(input, DATA_DELIM);
else
token[i] = strtok(NULL, DATA_DELIM);
printf("%d: %s\n", i, token[i] != 0 ? token[i] : "<<NULL POINTER>>");
}
}
return 0;
}
根据给定的数据,我得到:
Input: C0001|H|Espresso Classics|The traditional espresso favourites.
0: C0001
1: H
2: Espresso Classics
3: The traditional espresso favourites.
Input: C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.
0: C0002
1: H
2: Espresso Espresions
3: Delicious blend of espresso, milk, and luscious flavours.
Input: C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.
0: C0003
1: H
2: Tea & Cocoa
3: Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.
Input: C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.
0: C0004
1: C
2: Iced Chocolate
3: Gloria Jean's version of a traditional favourite.
Input: C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.
0: C0005
1: C
2: Mocha Chillers
3: An icy blend of chocolate, coffee, milk and delicious mix-ins.
Input: C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.
0: C0006
1: C
2: Espresso Chillers
3: A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.
Input: C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.
0: C0007
1: C
2: On Ice
3: Cool refreshing Gloria Jean's creations over ice.
使用单字符分隔符字符串,在每行编号为3之后,我会得到一个额外的换行符。
这看起来很像你想要的。所以,要么你的输入有问题(你读它时是否回应它),或者你设法找到strtok()
的一个片状实现,或者你可能在Windows上并且数据行有回车以及新行,你会看到由于错误的回车而产生误导性的输出。
其中,我怀疑最后一次(Windows和杂散回车)是最有可能的 - 尽管我无法重现问题,即使使用DOS格式的数据文件(使用GCC 4.6在MacOS X 10.6.7上进行测试) 0.0)。