我试图从文本日志文件(Bowtie2测序对齐器)中提取一些信息并将它们呈现在表格中。文本文件如下所示:
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:21:50
3746112 reads; of these:
3746112 (100.00%) were paired; of these:
2937631 (78.42%) aligned concordantly 0 times
581094 (15.51%) aligned concordantly exactly 1 time
227387 (6.07%) aligned concordantly >1 times
----
2937631 pairs aligned 0 times concordantly or discordantly; of these:
5875262 mates make up the pairs; of these:
5382980 (91.62%) aligned 0 times
400492 (6.82%) aligned exactly 1 time
91790 (1.56%) aligned >1 times
28.15% overall alignment rate
Time searching: 00:21:50
Overall time: 00:21:50
我使用以下命令定义了一些变量,其中一些变量有两个字符串,例如,RDS_T相等,在以下文件中 3746112(100.00%):
RDS_T=`awk NR==5 GW2.log | awk '{print $1}'` #total number of reads
RDS_P=`awk NR==6 GW2.log | awk '{print $1, $2}'` #Paired reads and percentage (2 fields)
RDS_C1=`awk NR==8 GW2.log | awk '{print $1, $2}'` #concordantly once and percentage (2 fields)
RDS_C2=`awk NR==9 GW2.log | awk '{print $1, $2}'` #concordantly twice and percentage (2 fields)
ALGN_T=`awk NR==16 GW2.log | awk '{print $1}'`
我用这个来制作一张桌子,但它并没有那么好用:
printf "File\t Reads\t Paired reads\t Conc reads1\t Conc Reads2\t Total align\n\n\n GW1\t "%s$RDS_T\t" "%s" "$RDS_P"\t "%s" "$RDS_C1"\t "%s" "$RDS_C2"\t "%s$ALGN_T"\n"
虽然是单独的,但这些有效:
printf "%s$RDS_T
和
printf "%s" "$RDS_P"
我注意到的一件事是\ t不被解释
任何想法如何做到这一点,我对bash很新,所以试着保持温柔:)?
非常感谢, 盖
答案 0 :(得分:0)
无需多次致电awk
。您可以使用单个awk
脚本执行所有操作。请尝试以下命令:
awk -f t.awk GW2.log
其中t.awk
是:
NR==5 {
RDS_T=$1
}
NR==6 {
RDS_P=$1" "$2
}
NR==8 {
RDS_C1=$1" "$2
}
NR==9 {
RDS_C2=$1" "$2
}
NR==16 {
ALGN_T=$1
}
END {
fmt="%-12s %-12s %-18s %-18s %-18s %-18s\n"
printf fmt, "File", "Reads", "Paired reads", "Conc reads1", "Conc Reads2", "Total align"
printf fmt, "GW2.log", RDS_T, RDS_P, RDS_C1, RDS_C2, ALGN_T
}
带输出:
File Reads Paired reads Conc reads1 Conc Reads2 Total align
GW2.log 3746112 3746112 (100.00%) 581094 (15.51%) 227387 (6.07%) 28.15%
答案 1 :(得分:0)
您没有正确使用printf
。
printf
命令的用法是:printf format [arguments]
。 (参见man
页面。)
例如:
printf "My name is %s. I live in %s.\n" "John" "London"
因此,请将命令更改为:
printf "File\tReads\tPaired reads\tConc reads1\tConc Reads2\tTotal align\nGW1\t%s\t%s\t%s\t%s\t%s\n" "$RDS_T" "$RDS_P" "$RDS_C1" "$RDS_C2" "$ALGN_T"
答案 2 :(得分:0)
这是我的最终剧本,我使用了dogbane选项,因为我想引入一个循环并且只有一个文件(即没有额外的.awk文件),但我没有使用HåkonHægland方法(尽管我'我很乐意学习如何使用当前脚本执行此操作)。因此,脚本将为RNA-seq执行Bowtie2命令,生成相关目录并将每个.sam和.log文件(来自每个序列库)放在命令中生成的这些目录中。最后,该命令将生成一个小的.txt表,其中包含.log文件中的一些信息(例如,读取的总数)。我打算尝试完成脚本,例如它也会执行Tophat2,Cufflinks等'并且可能会从这些文件中吐出一些信息,如图形(使用Cuffdif和Cummerband)
#!/bin/bash
#run from /rdata/ngseq/Playground/guy/bowtie2
#to execute run: /localhome/gw57/Notes/pipeline3.sh
#Generates a "Summary.txt" file from the GW files
INPUT=/rdata/ngseq/original_data/rna/illumina/2013-05-05_Guy #
DATE=$(date +%d%m%y) #needs to add hours when run more than once per day
ROOT=140213_root_No_8
BT2INDEX=Bowtie2Index_Arabidopsis/genome
for i in {1..4}
do
if [ ! -d ./$ROOT ]
then
mkdir ./$ROOT/
fi
if [ ! -d ./$ROOT/$DATE"_run" ]
then
mkdir ./$ROOT/$DATE"_run"
fi
mkdir ./$ROOT/$DATE"_run"/GW$i
bowtie2 --local -q -5 30 -3 30 --phred33 -N 1 -L 10 --no-discordant -t --no-unal -p 12 -x $BT2INDEX -1\
$INPUT/GW$i/fastq/R1.fastq -2 $INPUT/GW$i/fastq/R2.fastq\
-S ./$ROOT/$DATE"_run"/GW$i/GW$i.sam 2>&1 | tee -a $ROOT/$DATE"_run"/GW$i/GW$i.log
done
printf "%-18s%-18s%-18s%-18s%-18s%-18s\n\n"\
"File" "Reads" "Paired_reads" "Conc Reads_once" "Conc_Reads>1" "Total_reads" > $ROOT/$DATE"_run"/Summary.txt
for i in {1..4}
do
RDS_T=`awk 'NR==5 {print $1}' $ROOT/$DATE"_run"/GW$i/GW$i.log`
RDS_P=`awk 'NR==6 {print $1, $2}' $ROOT/$DATE"_run"/GW$i/GW$i.log`
RDS_C1=`awk 'NR==8 {print $1, $2}' $ROOT/$DATE"_run"/GW$i/GW$i.log`
RDS_C2=`awk 'NR==9 {print $1, $2}' $ROOT/$DATE"_run"/GW$i/GW$i.log`
ALGN_T=`awk 'NR==18 {print $1}' $ROOT/$DATE"_run"/GW$i/GW$i.log`
printf "%-18s%-18s%-18s%-18s%-18s%-18s\n" "GW$i" "$RDS_T" "$RDS_P" "$RDS_C1" "$RDS_C2" "$ALGN_T"
done >> $ROOT/$DATE"_run"/Summary.txt