我有一个文件 test1.log
04/15/2016 02:22:46 PM - kneaddata.knead_data - INFO: Running kneaddata v0.5.1
04/15/2016 02:22:46 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
Input Reads: 69766650 Surviving: 55798391 (79.98%) Dropped: 13968259 (20.02%)
TrimmomaticSE: Completed successfully
04/15/2016 02:32:04 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/liaoming/kneaddata_v0.5.1/WGC066610D/WGC066610D_kneaddata.trimmed.fastq
04/15/2016 05:32:31 PM - kneaddata.utilities - DEBUG: 55798391 reads; of these:
55798391 (100.00%) were unpaired; of these:
55775635 (99.96%) aligned 0 times
17313 (0.03%) aligned exactly 1 time
5443 (0.01%) aligned >1 times
0.04% overall alignment rate
和其他文件格式相同但内容不同,例如 test2.log , test3.log 到 test60.log
我想从这些文件中提取两个数字。例如test1.log
,这两个数字将是55798391 55775635.
所以最终生成的文件counts.txt
将是这样的:
test1 55798391 55775635
test2 51000000 40000000
.....
test60 5000000 30000000
答案 0 :(得分:1)
awk
救援!
$ awk 'FNR==9{f=$1} FNR==10{print FILENAME,f,$1}' test{1..60}.log
如果不在同一目录中,要么在循环中调用,要么创建文件列表并管道到xargs awk
$ for i in {1..60}; do awk ... test$i/test$i.log; done
$ for i in {1..60}; do echo test$i/test$i.log; done | xargs awk ...