我正在尝试使用var basep = (from p in personList select p.Id).OrderBy(id => id);
int basepCount = personList.Count();
int blocksize = 1000;
int numblocks = (basepCount / blocksize) + (basepCount % blocksize == 0 ? 0 : 1);
for (var block = 0; block < numblocks; ++block) {
var firstPersonId = basep.Skip(block * blocksize).First();
var lastPersonId = basep.Skip(Math.Min(basepCount-1, block*blocksize+blocksize-1)).First();
var query = from p in personList.Where(ps => firstPersonId.CompareTo(ps.Id) <= 0 && ps.Id.CompareTo(lastPersonId) <= 0)
join a in accountList on p.Id equals a.PersonId
where a.Amount < 100
select a;
var groups = query.GroupBy(a => a.PersonId);
// work on groups
}
来提取awk
中包含$2
的那些行。文本将始终相同,但数字将是可变的。
档案 exon (some digit that is 1-99) sequence
tab-delimeted
所需的输出 Tier 2 exon 10 sequence xxxxx
Tier 2 full sequence yyyyy
Tier 1 exon 5 sequence aaaaa
tab-delimeted
AWK
Tier 2 exon 10 sequence xxxxx
Tier 1 exon 5 sequence aaaaa
答案 0 :(得分:2)
使用awk
awk '/exon\s+[0-9]+\s+sequence/ {print $0}' file
或grep
grep -P 'exon\s+[0-9]+\s+sequence' file
答案 1 :(得分:1)
假设:
awk 'BEGIN{FS="\t"; OFS="|"} $1=$1' file
Tier 2|exon 10 sequence|xxxxx
Tier 2|full sequence|yyyyy
Tier 1|exon 5 sequence|aaaaa
(即,标签是|
在上面的位置)
你可以这样做:
$ awk -F"\t" '$2~/exon[ ]+[0-9][0-9]?/' /tmp/file
Tier 2 exon 10 sequence xxxxx
Tier 1 exon 5 sequence aaaaa
答案 2 :(得分:1)
awk '$3 ~ /exon/' file
Tier 2 exon 10 sequence xxxxx
Tier 1 exon 5 sequence aaaaa
答案 3 :(得分:1)
awk -F'\t' '$2 ~ /exon [1-9][0-9]? sequence/' file
请注意1-99
的正则表达式为[1-9][0-9]?
,而不是[0-9][0-9]?
,因为后者包含0
(以及00
,01
等等。)。