我有这样一个fasta文件:
>gnl|SRA|SRR035294.8571.2 FIHSSUW01ASCWS.2 length=224
GAGATGAAATAGATCTTGGCATATATGTACATGCTTGATCTCAGTTTTGATTGGATTTTATCCATTTTAG
CTATCTTAACTATTAATCTTGAAATGAAGCTTTAATTTATGTAGGAAGTTTATGAAATTTAGGAAAAAAA
AAGAAAAAAACAAAACAATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGGCAAGGCA
CACAGGGGATAGGN
>gnl|SRA|SRR035294.8572.2 FIHSSUW01ETZME.2 length=254
ACTAACCAGGTGGTAAACAACTACTACAGGCCAGATTTGAAGAAGGCTGCTCTTGCTAGATTGAGTGCAG
TGAACAGAAGCCTTAAGGTTTCAAAGTCTGGTGTGAAGAAGAAGAACAGACAGGCAGTTAGGATCCATGG
TAGGAAGTGAAGCTGTGATTTGCCTACCGTCTGATATTCATCGTATCACTTTCTAGCTGTTCCGTCTTGT
TTGGCAAGTGTTTGGTTTTACGTGCGAGTAGTTATATGTTGCGC
>gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230
AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGGATGTACCAATTCAAAAAGAAAACAGCAGTT
GGGGGCAAAACAATTAAGTTGTAACGAATGCATATATATGATTAATCTTCTAACACATTATTTTTGTCTC
AAAAAAAAAGAAAAAAAACAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGG
CAAGGCACACAGGGGATAGG
>gnl|SRA|SRR035294.8574.2 FIHSSUW01EHI3P.2 length=153
TGCAAGTTTACAACTTAAAACAACTTTTCTCACAGTGAACAATAAATTTATCAATTCTCATGCAAAAAAA
AAGAAAAAAACAAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGGCAAGGCA
CACAGGGGATAGG
>gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287
AACAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGAGATTACAGGTATTGCAAGTTTCAAGCCTGTC
ATAAAGACTCAAAGCCGCTTGTAATTTGTGTTTCCTAGTTGGGGAAGCTGTTTGTTCTTTATTGTGCTAT
ATGTATTTATTTGAAAGTTTGGATGAACTCAATAAATAAAAGAAAATCTTCATTGTGGGTTACAATTTGG
ACATGAACATGCATGAATAATGTACCAATTTAGCAAAAAAAAAGAAAAAAACAAAAAACAAATAGTCGGC
CGGCCCG
>gnl|SRA|SRR035294.8576.2 FIHSSUW01C911A.2 length=265
TATTCTCAGGTACGAAATATGAGTTTGCTGATAAATTGATGGATTGGGAATCAGCCTGCATAATAAGATA
TTCCCAATTAACTTTGCCCGTTAGTTCTTTTAGCTTTTCCTTTAAAGGCACGAGTCTTTCAACCAAAACA
TTACAGCAAAGTCTAACTGCCTCACAGCTTGCTTCAGAAGTTGTACCCCCGGCCGTAATGGCCACTCTGC
GTTGATACCACTGCTTCTGAGACACGCAACAGGGGATAGGCAAGGCACACAGGGG
我已经用bash编写了这个脚本
STRING=$1
FILE=$(pwd)"/"$2
if [ -z "$STRING" ]
then
echo "Usage: fastaFind.sh <query> <fasta file>"
else
echo ""
awk 'BEGIN { RS = ">" } ; $0 ~ "'$STRING'" { print $0 }' "$FILE"
fi
我正在运行此命令
fastaFind.sh "gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230" file.fasta
但它为未终止的字符串返回错误。我想要实现的是在执行命令后检索查询的特定序列。 e.g
>gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230
AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGGATGTACCAATTCAAAAAGAAAACAGCAGTT
GGGGGCAAAACAATTAAGTTGTAACGAATGCATATATATGATTAATCTTCTAACACATTATTTTTGTCTC
AAAAAAAAAGAAAAAAAACAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGG
答案 0 :(得分:1)
有几个问题需要解决。
STRING
调用中的shell awk
变量,因此整个awk
命令必须用双引号括起来。但是你必须在<{1}}命令awk
,因为该模式包含在正则表达式中具有特殊含义的字符(如~
)。因此,您需要一种方法来匹配输入记录的一部分;这就是|
进行比较背后的原因(通过重新定义$1
来实现)。
FS
答案 1 :(得分:1)
或者就是这样:
awk -v "RS=>" '/length=254/ { print $0; }' file
答案 2 :(得分:1)
您的awk
命令最好是:
awk 'BEGIN{ ORS = ""; RS = ">"; FS="\n" } $1 == "pattern" { print ">" $0 }' file
或者
awk -v p="pattern" 'BEGIN {ORS = ""; RS = ">"; FS = "\n" } $1 == p { print ">" $0 }' file
你的shell脚本是:
#!/bin/bash
STRING=$1
FILE=$2
if [[ -z $STRING ]]; then
echo "Usage: fastaFind.sh <query> <fasta file>"
else
awk -v p="$STRING" 'BEGIN{ ORS=""; RS=">"; FS="\n" } $1 == p { print ">" $0 }' "$FILE"
fi
使用示例:
bash temp.sh 'gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287' temp.txt
输出:
>gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287
AACAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGAGATTACAGGTATTGCAAGTTTCAAGCCTGTC
ATAAAGACTCAAAGCCGCTTGTAATTTGTGTTTCCTAGTTGGGGAAGCTGTTTGTTCTTTATTGTGCTAT
ATGTATTTATTTGAAAGTTTGGATGAACTCAATAAATAAAAGAAAATCTTCATTGTGGGTTACAATTTGG
ACATGAACATGCATGAATAATGTACCAATTTAGCAAAAAAAAAGAAAAAAACAAAAAACAAATAGTCGGC
CGGCCCG