更新的问题:
我有一个config.file
,其中定义了一些最终在不同脚本中调用的变量。
$cat config.file
#1 Accession number ref
ref=L41223.2
#2 Accession number SRA
SRA=SRA7361534
#3 Path to SRA
path_SRA='/Volumes/5TB/sra/'
#4 Path to ref
path_ref='/Volumes/5TB/results/species1/'
#3(通往SRA的路径)是恒定的,永不改变。对于其他变量($ref
,$sra
和$path_ref
),我想从input.file
的不同字段中逐一读取它们:
$cat input.file
species1 L41223.2 SRA7361534
species2 D45023.5 SRA9473231
species3 L42823.6 SRA0918881
...
所有这些变量在script.sh
中被多次调用:
#!/bin/bash
# Path to the configuration file
. /Users/Main/config.file
# Use NCBI's e-utilities to download reference files
esearch -db nucleotide -query $ref | efetch -format fasta > $path_ref$ref.fasta
# Using NCBI's sratoolkit to download SRA file
prefetch $SRA
cd $path_SRA
mv *.sra $path_ref
# Decompress the SRA file
cd $path_ref; if fastq-dump --split-3 $SRA.sra ; then
echo "SRA file successfully decompressed. Deleting the SRA file now..."
rm $SRA.sra
else
echo "Could not decompress SRA file"
fi
# Use bwa to align DNA reads to the reference sequence
cd $path_ref;
bwa index -p INDEX $ref.fasta
bwa aln -t $core INDEX *_1.fastq > 1.sai
bwa aln -t $core INDEX *_2.fastq > 2.sai
bwa sampe INDEX 1.sai 2.sai *_1.fastq *_2.fastq | samtools view -hq 5 > $SRA.Q5.sam
# Use samtools for conversion
samtools view -bT $ref.fasta $SRA.Q5.sam > $SRA.Q5.bam
samtools sort $SRA.Q5.bam -o $SRA.sorted
# use bedtools for coverage
bedtools genomecov -d -ibam $SRA.sorted.bam > $SRA.gencov.txt
# use awk for extraction
awk '$2 ~ /81|161|97|145/ {print $0}' $SRA.Q5.sam > $SRA.OTW.sam
samtools view -bT $ref.fasta $SRA.OTW.sam > $SRA.OTW.bam
samtools sort $SRA.OTW.bam -o $SRA.OTW.sorted.bam
# Extract FLAG, POS, CIGAR and TLEN for outward-oriented reads
awk '$2 ~ /81|161|97|145/ {print $2, $4, $6, $9}' $SRA.Q5.sam > $SRA.OTW.txt
# Get per-base coverage for outward-oriented reads
bedtools genomecov -d -ibam $SRA.OTW.sorted.bam > $SRA.OTW.gencoverage.txt
# Simplify the output by averaging read coverage over 50 bp window; prints the average count value and last genomic position
awk '{sum+=$3; count++} FNR % 50 == 0 {print $2, (sum/count); count=sum = ""}' $SRA.OTW.gencoverage.txt > $SRA.OTW.50sum.txt
#### End of the script
我想做的是从input.file
“读”到config.file
中。第一个字段(species1 ...)用作$ path_ref的输入,字段2(L41223.2 ...)用作$ ref的输入,第三个字段(SRA7361534 ...)用作输入$ SRA变量。完成第一轮(基本上是第一行)后,script.sh
将再次运行并从第2行读取字段1,2和3,依此类推。基本上是一个循环,但是比下面的答案要复杂一些,因为在脚本的不同位置调用了不同的变量。
这对于一个变量很好用,但是我无法在整个脚本中使用三个不同的变量来实现它:
while read -r c1 c2 c3; do
bwa index -p INDEX ${c2}.fasta
# place rest of your script here
done < input.file
非常感谢。
答案 0 :(得分:0)
在script.sh
的{{1}}行之后,添加以下行:
. /Users/Main/config.file
然后在文件末尾添加一个number_of_inputs=$(wc -l < input.file)
for (( i=1 ; i <= number_of_inputs ; i++ )); do
# extract columns $1, $2, $3 here, from line $i - please change appropriately
ref=$( awk "NR==$i{print \$1}" input.file)
SRA=$( awk "NR==$i{print \$2}" input.file)
path_ref=$(awk "NR==$i{print \$3}" input.file)
,这样整个过程就会遍历done
每行中的值,并相应地设置值