1. 29250X 90 3 ASM123NO48JHF3M344
2. 29250X FD 3 DFWO3957NSTCVIKERH
3. 292505 3R 4 PGHU35N77P10C8WE0W
4. 292505 TH 4 8RJRO239F0117R5MFY
我有一个文本文件codes.txt
,上面有格式的代码。标准是第二列必须有两个连续的字母要提取;因此,由于(2)具有FD且(4)具有TH,所以这两行都将被提取到另一个文件results.txt
中。我可以使用什么RegEx命令来完成此任务?
答案 0 :(得分:1)
为了消除误报,我将尽可能多地匹配行。
egrep '^[0-9]+\. .{6,} [A-Z]{2} [0-9] .+' codes.txt > results.txt
正则表达式解释:
^ Anchor to the beginning of the line
[0-9]+ Match 1 or more numbers
\. Followed by a period and a space
.{6,} Followed by at least 6 but maybe more characters and a space
[A-Z]{2} Followed by 2 Capital letters and a space
[0-9] Then a digit and a space
.+ Then 1 or more characters
答案 1 :(得分:0)
以下解决方案可能适合您:
codes.txt
1. 29250X 90 3 ASM123NO48JHF3M344
2. 29250X FD 3 DFWO3957NSTCVIKERH
3. 292505 3R 4 PGHU35N77P10C8WE0W
4. 292505 TH 4 8RJRO239F0117R5MFY
代码:
import re
f = open('./codes.txt', 'r')
lines = f.readlines()
f.close()
f = open('./results.txt', 'w')
pa = re.compile(r' [a-zA-Z]{2} ')
for l in lines:
m = pa.search(l)
if m and m.group(0):
f.write(l)
#print(l, end="")
f.close()
输出(在results.txt中)
2. 29250X FD 3 DFWO3957NSTCVIKERH
4. 292505 TH 4 8RJRO239F0117R5MFY