我有一个包含大量文本的文件,用换行符分隔:
离。
"This is sentence 1.\n"
"This is sentence 2.\n"
"This is sentence 3. It has more characters then some other ones.\n"
"This is sentence 4. Again it also has a whole bunch of characters.\n"
我希望能够使用一些命令行工具,这些工具将为每一行计算每行中的字符数,然后,如果每行有超过X个字符,则拆分句点( “。”)然后计算分割线的每个元素中的字符数。
离。最终输出,按行号:
1. 24
2. 24
3. 69: 20, 49 (i.e. "This is sentence 3" has 20 characters, "It has more characters then some other ones" has 49 characters)
wc
仅将文件名作为输入,因此我无法指导它接受文本字符串来进行字符计数
head -n2 processed.txt | tr "." "\n" | xargs -0 -I line wc -m line
给我错误:“:open:No such file or directory”
答案 0 :(得分:2)
awk -F. '{print length($0),NF,length($1)}' yourfile
输出:
23 2 19
23 2 19
68 3 19
70 3 19
它使用句点作为字段分隔符(-F。),打印整行的长度($ 0),字段数(NF)和第一个字段的长度($ 1)。
这是打印整行和每个字段长度的另一个小例子:
awk -F. '{print $0;for(i=0;i<NF;i++)print length($i)}' yourfile
"This is sentence 1.\n"
23
19
"This is sentence 2.\n"
23
19
"This is sentence 3. It has more characters then some other ones.\n"
68
19
44
"This is sentence 4. Again it also has a whole bunch of characters.\n"
70
19
46
顺便说一下,“wc”可以像这样处理发送到stdin的字符串:
echo -n "Hello" | wc -c
5
答案 1 :(得分:0)
怎么样:
head -n2 processed.txt | tr "." "\n" | wc -m line
您应该更好地了解xargs
做什么以及管道如何工作。在使用它们之前,请谷歌获得关于那些的好教程=)。
xargs
将每行分别传递给下一个实用程序。这不是您想要的:您希望wc
获取所有行。所以只需将tr
的整个输出传递给它。