bash变量重复仅在命令中使用一个实例

时间:2018-07-17 13:25:06

标签: bash

在下面的bash中,我循环浏览成对的.fastq文件,并在带注释的命令中使用它们。变量$pre中有名称,并且确实提取了它,我不知道的问题是如何仅在注释命令中使用它一次?在下面的示例中,$preNA11111,但是被提取了两次。有没有一种方法只能在命令中使用它一次?我尝试用awk删除重复项而没有运气,并尝试使用cut。谢谢 :)。

重击

 for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
 sample=${file%.fastq}
 bname=`basename $sample`
 pre="$(echo $bname|cut -d- -f1,1)"

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam
   echo "$sample.fastq"
   echo "$sample"
   echo "$pre"
   done

当前输出

/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001         `this is $sample`
NA11111                                                                   `this is $pre`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001         `this is $sample`
NA11111                                                                   `this is $pre`

所需的输出

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam

$sample.fastq = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq
$sample = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001
$pre = NA11111

2 个答案:

答案 0 :(得分:1)

最简单的方法就是跟踪您已经看过的项目, 如果匹配则跳过当前文件。

declare -A seen=()

for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
  sample=${file%.fastq}
  bname=$(basename "$sample")
  pre=${name%%-*}

  # Go to the next file if $pre has already been seen
  [[ -v seen[$pre] ]] && continue

  # Remember that we've now seen $pre
  seen[$pre]=

  bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" "/home/cmccabe/Desktop/fastq/${pre}_aln.sam"
done

答案 1 :(得分:1)

我认为您正在尝试实现以下目标:

for file in /home/cmccabe/Desktop/fastq/*_R1_*.fastq
do
    file2=$(echo $file | sed 's/_R1_/_R2_/')
    sample=$(basename $file .fastq | cut -d- -f1)

    bwa mem -M -t 16 -R "@RG\tID:$sample\tSM:$sample" /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta $file $file2 > /home/cmccabe/Desktop/fastq/${sample}_aln.sam
done

我认为,这是对数据的最佳常识处理。我假设您将需要两端并且将对结果进行后处理,因此需要ReadGroup行。