我想根据任意切入点将文本中的长句子拆分为较小的句子。我的方法考虑使用空格来计数单词。给定具有内容的输入文件input.txt
ciao
ciao ciao
ciao ciao ciao ciao ciao ciao
ciao ciao ciao ciao
ciao ciao ciao
我希望:
ciao
ciao ciao
ciao ciao ciao
ciao ciao ciao
ciao ciao ciao
ciao
ciao ciao ciao
截止点3
。
我用以下代码解决了这个问题:
#include<stdlib.h>
#include<stdio.h>
#include<ctype.h>
/* MAIN */
int main(int argc, char *argv[]){
FILE *inp = fopen(argv[1], "r");
char c;
int word_counter = 0;
while((c = fgetc(inp)) != EOF){
printf("%c", c);
if(isspace(c))
++word_counter;
/* Cutter */
if(word_counter == 3){
printf("\n");
word_counter = 0; /* counter to zero */
}
}
return 0;
}
返回,作为输出:
ciao
ciao ciao
ciao ciao ciao
我无法理解这种行为的原因。满足条件时,代码是否应该仅打印额外的换行符?为什么跳过整个句子?
答案 0 :(得分:1)
读取换行符后,您需要将word_counter
重置为零。
另外,如果c
!= 3,则每个word_counter
打印两次:
printf("%c", c); // ** here
if(isspace(c))
++word_counter;
/* Cutter */
if(word_counter == 3){
printf("\n");
word_counter = 0;
}
else
printf("%c", c); // ** and here
也许可以试试看(未测试):
while((c = fgetc(inp)) != EOF){
if (isspace(c) && ++word_counter == 3 ) {
printf("\n");
word_counter = 0; /* counter to zero */
continue;
}
if (c == '\n') {
word_counter = 0;
}
printf("%c", c);
}
更短:
while((c = fgetc(inp)) != EOF){
if ( (isspace(c) && ++word_counter == 3) || (c == '\n') ) {
printf("\n");
word_counter = 0; /* counter to zero */
continue;
}
printf("%c", c);
}
还要记住,如果true
,isspace(c)将返回c == '\n'
,因此也可以处理\r\n
的更健壮的版本将是:
while((c = fgetc(inp)) != EOF){
if ( (c == ' ' || c == '\t') && (++word_counter == 3) ) {
word_counter = 0;
printf("\n");
continue;
}
if ( c == '\r' || c == '\n' ) {
word_counter = 0;
}
printf("%c", c);
}