Question

我写了一个简单的正则表达式，用于与pcregrep一起在wiggle文件中返回给定的染色体（见下文）。

 pcregrep -M '^fixedStep chrom=2.*\n[0-9\n]*' input.txt

Input.wig

fixedStep chrom=1 start=14154 step=1
1
1
1
1
1
fixedStep chrom=2 start=14154 step=1
1
1
3
10
120
14
5
9
fixedStep chrom=2 start=20145 step=1
1
1
11
1
1
fixedStep chrom=2 start=30535 step=1
3
24
11
fixedStep chrom=3 start=14154 step=1
1
1
1
1
1

输出是：

fixedStep chrom=2 start=14154 step=1
1
1
3
10
120
14
5
9
fixedStep chrom=2 start=30124 step=1
fixedStep chrom=2 start=50345 step=1
4
23
90
fixedStep chrom=3 start=14154 step=1

但我想得到的是：

fixedStep chrom=2 start=14154 step=1
1
1
3
10
120
14
5
9
fixedStep chrom=2 start=20145 step=1
1
1
11
1
1
fixedStep chrom=2 start=30535 step=1
3
24
11

更具体地说，我想找到匹配的文件中的每个条目

fixedStep chrom=2 start=ANY step=1
1
2
3
4

并删除它，同时保留所有其他染色体。

编辑：

我部分解决了搜索问题;我可以用

pcregrep -M '^fixed.*chrom=2.*(\n[0-9]+)*' input.txt

获得正确的输出;但我还没有找到一种从input.txt中删除2号染色体的有效方法。

Answer 1

您可以使用awk吗？那么这应该有用：

awk '/chrom=2/{p=1}/chrom=[^2]/{p=0}p' input

pcregrep一个摆动文件来提取染色体。多行正则表达式搜索

1 个答案: