在下面的bash
中,我循环浏览成对的.fastq
文件,并在带注释的命令中使用它们。变量$pre
中有名称,并且确实提取了它,我不知道的问题是如何仅在注释命令中使用它一次?在下面的示例中,$pre
是NA11111
,但是被提取了两次。有没有一种方法只能在命令中使用它一次?我尝试用awk
删除重复项而没有运气,并尝试使用cut
。谢谢 :)。
重击
for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
sample=${file%.fastq}
bname=`basename $sample`
pre="$(echo $bname|cut -d- -f1,1)"
#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam
echo "$sample.fastq"
echo "$sample"
echo "$pre"
done
当前输出
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001 `this is $sample`
NA11111 `this is $pre`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001.fastq `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001 `this is $sample`
NA11111 `this is $pre`
所需的输出
#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam
$sample.fastq = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq
$sample = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001
$pre = NA11111
答案 0 :(得分:1)
最简单的方法就是跟踪您已经看过的项目, 如果匹配则跳过当前文件。
declare -A seen=()
for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
sample=${file%.fastq}
bname=$(basename "$sample")
pre=${name%%-*}
# Go to the next file if $pre has already been seen
[[ -v seen[$pre] ]] && continue
# Remember that we've now seen $pre
seen[$pre]=
bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" "/home/cmccabe/Desktop/fastq/${pre}_aln.sam"
done
答案 1 :(得分:1)
我认为您正在尝试实现以下目标:
for file in /home/cmccabe/Desktop/fastq/*_R1_*.fastq
do
file2=$(echo $file | sed 's/_R1_/_R2_/')
sample=$(basename $file .fastq | cut -d- -f1)
bwa mem -M -t 16 -R "@RG\tID:$sample\tSM:$sample" /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta $file $file2 > /home/cmccabe/Desktop/fastq/${sample}_aln.sam
done
我认为,这是对数据的最佳常识处理。我假设您将需要两端并且将对结果进行后处理,因此需要ReadGroup行。