RegEx查找多个双字母实例?

时间:2018-10-05 13:55:42

标签: regex linux

1. 29250X 90 3 ASM123NO48JHF3M344
2. 29250X FD 3 DFWO3957NSTCVIKERH
3. 292505 3R 4 PGHU35N77P10C8WE0W
4. 292505 TH 4 8RJRO239F0117R5MFY

我有一个文本文件codes.txt,上面有格式的代码。标准是第二列必须有两个连续的字母要提取;因此,由于(2)具有FD且(4)具有TH,所以这两行都将被提取到另一个文件results.txt中。我可以使用什么RegEx命令来完成此任务?

2 个答案:

答案 0 :(得分:1)

为了消除误报,我将尽可能多地匹配行。

egrep '^[0-9]+\. .{6,} [A-Z]{2} [0-9] .+' codes.txt > results.txt

正则表达式解释:

^         Anchor to the beginning of the line
[0-9]+    Match 1 or more numbers
\.        Followed by a period and a space
.{6,}     Followed by at least 6 but maybe more characters and a space
[A-Z]{2}  Followed by 2 Capital letters and a space
[0-9]     Then a digit and a space
.+        Then 1 or more characters

答案 1 :(得分:0)

以下解决方案可能适合您:

codes.txt

1. 29250X 90 3 ASM123NO48JHF3M344
2. 29250X FD 3 DFWO3957NSTCVIKERH
3. 292505 3R 4 PGHU35N77P10C8WE0W
4. 292505 TH 4 8RJRO239F0117R5MFY

代码:

import re

f = open('./codes.txt', 'r')
lines = f.readlines()
f.close()
f = open('./results.txt', 'w')

pa = re.compile(r' [a-zA-Z]{2} ')

for l in lines:
  m = pa.search(l)
  if m and m.group(0):
    f.write(l)
    #print(l, end="")

f.close()

输出(在results.txt中)

2. 29250X FD 3 DFWO3957NSTCVIKERH
4. 292505 TH 4 8RJRO239F0117R5MFY