awk提取文件中包含匹配模式和可变数字的行

时间:2017-07-03 21:28:38

标签: regex awk

我正在尝试使用var basep = (from p in personList select p.Id).OrderBy(id => id); int basepCount = personList.Count(); int blocksize = 1000; int numblocks = (basepCount / blocksize) + (basepCount % blocksize == 0 ? 0 : 1); for (var block = 0; block < numblocks; ++block) { var firstPersonId = basep.Skip(block * blocksize).First(); var lastPersonId = basep.Skip(Math.Min(basepCount-1, block*blocksize+blocksize-1)).First(); var query = from p in personList.Where(ps => firstPersonId.CompareTo(ps.Id) <= 0 && ps.Id.CompareTo(lastPersonId) <= 0) join a in accountList on p.Id equals a.PersonId where a.Amount < 100 select a; var groups = query.GroupBy(a => a.PersonId); // work on groups } 来提取awk中包含$2的那些行。文本将始终相同,但数字将是可变的。

档案 exon (some digit that is 1-99) sequence

tab-delimeted

所需的输出 Tier 2 exon 10 sequence xxxxx Tier 2 full sequence yyyyy Tier 1 exon 5 sequence aaaaa

tab-delimeted

AWK

Tier 2  exon 10 sequence    xxxxx
Tier 1  exon 5 sequence aaaaa

4 个答案:

答案 0 :(得分:2)

使用awk

awk   '/exon\s+[0-9]+\s+sequence/  {print $0}'  file

或grep

 grep -P 'exon\s+[0-9]+\s+sequence' file

答案 1 :(得分:1)

假设:

awk 'BEGIN{FS="\t"; OFS="|"} $1=$1' file 
Tier 2|exon 10 sequence|xxxxx
Tier 2|full sequence|yyyyy
Tier 1|exon 5 sequence|aaaaa

(即,标签是|在上面的位置)

你可以这样做:

$ awk -F"\t" '$2~/exon[ ]+[0-9][0-9]?/' /tmp/file 
Tier 2  exon 10 sequence    xxxxx
Tier 1  exon 5 sequence aaaaa

答案 2 :(得分:1)

awk '$3 ~ /exon/' file

Tier 2  exon 10 sequence    xxxxx
Tier 1  exon 5 sequence aaaaa

答案 3 :(得分:1)

awk -F'\t' '$2 ~ /exon [1-9][0-9]? sequence/' file

请注意1-99的正则表达式为[1-9][0-9]?,而不是[0-9][0-9]?,因为后者包含0(以及0001等等。)。