我有2个文件。一个是包含多个fasta序列的fasta文件,而另一个文件包含我想要搜索的候选序列的名称(文件示例如下)。
seq.fasta
>Clone_18
GTTACGGGGGACACATTTTCCCTTCCAATGCTGCTTTCAGTGATAAATTGAGCATGATGGATGCTGATAATATCATTCCCGTGT
>Clone_23
GTTACGGGGGGCCGAAAAACACCCAATCTCTCTCTCGCTGAAACCCTACCTGTAATTTGCCTCCGATAGCCTTCCCCGGTGA
>Clone_27-1
GTTACGGGGACCACACCCTCACACATACAAACACAAACACTTCAAGTGACTTAGTGTGTTTCAGCAAAACATGGCTTC
>Clone_27-2
GTTACGGGGACCACACCCTCACACATACAAACACAAACACTTCAAGTGACTTAGTGTGTTTCAGCAAAACATGGCTTCGTTTTGTTCTAGATTAACTATCAGTTTGGTTCTGTTTGTCCTCGTACTGGGTTGTGTCAATGCACAACTT
>Clone_34-1
GTTACGGGGGAATAACAAAACTCACCAACTAACAACTAACTACTACTTCACTTTTCAACTACTTTACTACAATACTAAGAATGAAAACCATTCTCCTCATTATCTTTGCTCTCGCTCTTTTCACAAGAGCTCAAGTCCCTGGCTACCAAGCCATCG
>Clone_34-3
GTTACGGGGGAATAACAAAACTCACCAACTAACAACTAACTACTACTTCACTTTTCAACTACTTTACTACAATACTAAGAATGAAAACCATTCTCCTCATTATCTTTGCTCTCGCTCTTTTCACAAGAGCTCAAGTCCCTGGCTACCAAGCCATCGATATCGCTGAAGCCCAATC
>Clone_44-1
GTTACGGGGGAATCCGAATTCACAGATTCAATTACACCCTAAAATCTATCTTCTCTACTTTCCCTCTCTCCATTCTCTCTCACACACTGTCACACACATCC
>Clone_44-3
GTTACGGGGGAATCCGAATTCACAGATTCAATTACACCCTAAAATCTATCTTCTCTACTTTCCCTCTCTCCATTCTCTCTCACACACTGTCACACACATCCCGGCAGCGCAGCCGTCGTCTCTACCCTTCACCAGGAATAAGTTTATTTTTCTACTTAC
name.txt
Clone_23
Clone_27-1
我想使用AWK搜索fasta文件,并获取名称保存在另一个文件中的给定候选者的所有fasta序列。
awk 'NR==FNR{a[$1]=$1} BEGIN{RS="\n>"; FS="\n"} NR>FNR {if (match($1,">")) {sub(">","",$1)} for (p in a) {if ($1==p) print ">"$0}}' name.txt seq.fasta
问题是我只能在name.txt中提取第一个候选者的序列,就像这个
>Clone_23
GTTACGGGGGGCCGAAAAACACCCAATCTCTCTCTCGCTGAAACCCTACCTGTAATTTGCCTCCGATAGCCTTCCCCGGTGA
任何人都可以帮忙修复上面的单行awk命令吗?
答案 0 :(得分:2)
如果确定或甚至想要打印名称,您只需使用<!-- index.html -->
<!doctype html>
<!-- ASSIGN OUR ANGULAR MODULE -->
<html ng-app="diary">
<head>
<!-- META -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1"><!-- Optimize mobile viewport -->
<title>Mountain Diary</title>
<!-- SCROLLS -->
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"><!-- load bootstrap -->
<style>
html { overflow-y:scroll; }
body { padding-top:50px; }
#course-list { margin-bottom:30px; }
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.5.6/angular.js"></script><!-- load angular -->
<script src="http://netdna.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" type="text/javascript"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/ng-table/1.0.0/ng-table.js"></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/ng-table/1.0.0/ng-table.css">
<script src="core.js"></script>
</head>
<!-- SET THE CONTROLLER AND GET ALL TODOS -->
<body ng-controller="mainController">
<div class="container">
<!-- HEADER AND TODO COUNT -->
<div class="jumbotron text-center">
<h1>Mountain Diary - courses: <span class="label label-info">{{ courses.length }}</span></h1>
</div>
<table ng-table="tableParams" class="table-striped" show-filter="true">
<tr ng-repeat="course in data">
<td data-title="'id'" sortable="'id'">{{course.id}}</td>
<td data-title="'date'" sortable="'date'">{{course.date | date}}</td>
<td data-title="'courseType'" sortable="'course.courseType'" filter="{'course.courseType': 'text'}">{{course.courseType | uppercase }} </td>
<td data-title="'place'">{{course.place}}</td>
<td data-title="'partners'">{{course.partners}}</td>
<td data-title="'description'">{{course.description}}</td>
<td data-title="'descriptionDetail'">{{course.descriptionDetail}}</td>
<td data-title="'descriptionUrl'">{{course.descriptionUrl}}</td>
<td data-title="'photoUrl'">{{course.photoUrl}}</td>
</tr>
</table>
</div>
</body>
</html>
:
grep
grep -Ff name.txt -A1 a.fasta
从-f name.txt
name.txt
将它们视为文字字符串而非正则表达式-F
打印匹配行以及后续行如果输出中不需要这些名称,我只需输入另一个A1
:
grep
above_command | grep -v '>'
解决方案可能如下所示:
awk
在多行版本中有更好的解释:
awk 'NR==FNR{n[$0];next} substr($0,2) in n && getline' name.txt a.fasta
答案 1 :(得分:2)
$ awk 'NR==FNR{n[">"$0];next} f{print f ORS $0;f=""} $0 in n{f=$0}' name.txt seq.fasta
>Clone_23
GTTACGGGGGGCCGAAAAACACCCAATCTCTCTCTCGCTGAAACCCTACCTGTAATTTGCCTCCGATAGCCTTCCCCGGTGA
>Clone_27-1
GTTACGGGGACCACACCCTCACACATACAAACACAAACACTTCAAGTGACTTAGTGTGTTTCAGCAAAACATGGCTTC