我有多个样本,R1和R2以fastq.gz格式读取(这些文件相互补充)我想在所有文件上运行BWA mem配对端并行完成每个R1和R2补充文件应生成一个山姆文件。现在我正在从两个读取中创建两个sam文件
这是我想出的,但它不是我需要做的事情
for i in `find -maxdepth 2 -iname *fastq.gz -type f`; do
echo "bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta ${i}_R1_001.fastq.gz ${i}_R2_001.fastq.gz > ${i}_R1_R2.sam"
done
运行时它看起来像这样
bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_001.fastq.gz ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R2_001.fastq.gz > ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_R2.sam
bwa mem -t 12 H.Sapiens/ucsc.hg19.fasta ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_001.fastq.gz ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R2_001.fastq.gz > ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_R2.sam
-bash-4.1$
我理解问题在于iname但我该如何修复? 非常感谢你
答案 0 :(得分:1)
尝试
find -maxdepth 2 -iname \*fastq.gz -type f |
sed 's/_R[12]_001\.fastq\.gz$//' |
sort -u |
while IFS= read -r f; do
echo "bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta \"${f}_R1_001.fastq.gz\" \"${f}_R2_001.fastq.gz\" > \"${f}_R1_R2.sam\""
done
答案 1 :(得分:1)
Don't loop over a value parsed like that *。首先,为了理智而将代码放在脚本中,例如
cat > script < SCRIPT
for i; do
bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta "${i}_R"{1,2}_001.fastq.gz > "${i}_R1_R2.sam"
done
SCRIPT
chmod +x script
然后,使用-exec
谓词或xargs
,例如
find -maxdepth 2 -iname '*fastq.gz' -type f -exec ./script {} +
或
find -maxdepth 2 -iname '*fastq.gz' -type f -print0 | xargs -0 ./script
*它说&#34;解析ls
&#34;,但它适用于解析任何供人类消费的命令。明确地呼吁find
。
另一方面,如果你不在find
的参数旁边加引号,那么shell可能会将它们解释为globs。
find -iname *fastq.gz
可以扩展到
find -iname foofastq.gz barfastq.gz bazfastq.gz
你想要
find -iname '*fastq.gz'