根据列表中的数字将文件剪切成多个文件:
$ wc -l all.txt
8500 all.txt
$ wc -l STS.*.txt
2000 STS.input.answers-forums.txt
1500 STS.input.answers-students.txt
2000 STS.input.belief.txt
1500 STS.input.headlines.txt
1500 STS.input.images.txt
如何将all.txt
拆分为否。 STS.*.txt
的行,然后将它们保存到相应的STS.output.*.txt
?
我一直在手动这样做:
$ sed '1,2000!d' all.txt > STS.output.answers-forums.txt
$ sed '2001,3500!d' all.txt > STS.output.answers-students.txt
$ sed '3501,5500!d' all.txt > STS.output.belief.txt
$ sed '5501,7000!d' all.txt > STS.output.headlines.txt
$ sed '7001,8500!d' all.txt > STS.output.images.txt
all.txt
输入看起来像这样:
$ head all.txt
2.3059
2.2371
2.1277
2.1261
2.0576
2.0141
2.0206
2.0397
1.9467
1.8518
有时all.txt
看起来像这样:
$ head all.txt
2.3059 92.123
2.2371 1.123
2.1277 0.12452
2.1261123 213
2.0576 100
2.0141 0
2.02062 1
2.03972 34.123
1.9467 9.23
1.8518 9123.1
对于STS。* .txt,它们只是纯文本行,例如:
$ head STS.output.answers-forums.txt
The problem likely will mean corrective changes before the shuttle fleet starts flying again. He said the problem needs to be corrected before the space shuttle fleet is cleared to fly again.
The technology-laced Nasdaq Composite Index .IXIC inched down 1 point, or 0.11 percent, to 1,650. The broad Standard & Poor's 500 Index .SPX inched up 3 points, or 0.32 percent, to 970.
"It's a huge black eye," said publisher Arthur Ochs Sulzberger Jr., whose family has controlled the paper since 1896. "It's a huge black eye," Arthur Sulzberger, the newspaper's publisher, said of the scandal.
答案 0 :(得分:1)
我建议写一个循环:
for file in answers-forums answers-students belief headlines images; do
lines=$(wc -l < "STS.input.$file.txt")
sed "$(( total + 1 )),$(( total + lines ))!d" all.txt > "STS.output.$file.txt"
(( total += lines ))
done
total
跟踪到目前为止已读取的行数。 sed命令从total + 1
提取行到total + lines
,将它们写入相应的输出文件。
答案 1 :(得分:1)
希望你发布了一些示例输入,用于将输入文件(例如,10行)拆分为输出文件,例如,2,3和5行,而不是8500行....这样就可以给我们测试解决方案的东西。哦,这可能有用,但当然没有经过测试:
awk '
ARGIND < (ARGC-1) { outfile[NR] = gensub(/input/,"output","",FILENAME); next }
{ print > outfile[FNR] }
' STS.input.* all.txt
以上使用的GNU awk用于ARGIND和gensub()。
它只是创建一个数组,将所有“输入”文件中的每个行号映射到“输出”文件的名称,该文件应写入相同行号“all.txt”。
任何时候你在shell中编写循环只是为了操作文本你都有错误的方法。创建shell的人也为shell创建了awk来调用操作文本,所以就这样做。