我有一个包含以下格式数据的文件:
1 AA/BB 0C89JG
2 ABANO/ANA VICTORIA F12LFJ
3 ABBOUDLASTNAME/ABBOUDF DWPTHC
4 ABDALLAH/SIJAM H0ZDM9
5 ABDEL MESSIH/DINA T0SF8N
6 ABHISHEK/PRAMANIK 7SLKXV
7 ABHYANKAR/DHANANJAY 7SM0BV
8 ABOUSALAMA/FEMKE LTTRQC
9 ABRAMOVA/NATALIA 77LCPZ
10 ABRANTES/JOAO KXZC7Q
11 ABRATH/LUC D5J99J
12 ABREO/HECTOR CXDH4G
13 ABREU/ANDREA 242GRC
14 ABREU/MARCELO 2436R7
15 ABREU/VANDA 3HDNQQ
16 ABTS/NATHALIE DSK9TN
17 ABTS/NATHALIE FZ0LN4
我正在尝试提取最后6个字符,例如第17行的FZ0LN4。 我提出的正则表达式是:
([0-9]{1,5})([A-Z /]) ([0-9A-Z]{6})
但目前它还没有奏效。任何人都可以指出问题是什么?
答案 0 :(得分:2)
有几个问题:
[A-Z /]
缺少重复运算符。我会像这样重写正则表达式:
In [8]: re.match(r'\s*(\d+)\s*([A-Z /]+?)\s*(\w+)$', ' 15 ABREU/VANDA 3HDNQQ').groups()
Out[8]: ('15', 'ABREU/VANDA', '3HDNQQ')
如果你只需要最后六个字符,则不需要正则表达式:
In [15]: s = ' 15 ABREU/VANDA 3HDNQQ'
In [16]: s[-6:]
Out[16]: '3HDNQQ'
答案 1 :(得分:2)
如果您只需要该行末尾的字符串,则可以使用更简单的正则表达式,例如:\b\w{6}\b$
答案 2 :(得分:1)
你是在寻找最后一行(17)吗?如果是这样,请重新搜索整个字符串:
import re
myString="""
1 AA/BB 0C89JG
2 ABANO/ANA VICTORIA F12LFJ
3 ABBOUDLASTNAME/ABBOUDF DWPTHC
4 ABDALLAH/SIJAM H0ZDM9
5 ABDEL MESSIH/DINA T0SF8N
6 ABHISHEK/PRAMANIK 7SLKXV
7 ABHYANKAR/DHANANJAY 7SM0BV
8 ABOUSALAMA/FEMKE LTTRQC
9 ABRAMOVA/NATALIA 77LCPZ
10 ABRANTES/JOAO KXZC7Q
11 ABRATH/LUC D5J99J
12 ABREO/HECTOR CXDH4G
13 ABREU/ANDREA 242GRC
14 ABREU/MARCELO 2436R7
15 ABREU/VANDA 3HDNQQ
16 ABTS/NATHALIE DSK9TN
17 ABTS/NATHALIE FZ0LN4
"""
m = re.search("(\S{6})$", myString)
if m:
print m.group(1)
如果您需要找到特定的行,您应该单独迭代这些行:
for line in myString.split("\n"):
m = re.search("^\s*17\s*.*(\S{6})$", line)
if m:
print m.group(1)
答案 3 :(得分:1)
这很容易在没有正则表达式的情况下完成:
st='''\
1 AA/BB 0C89JG
2 ABANO/ANA VICTORIA F12LFJ
3 ABBOUDLASTNAME/ABBOUDF DWPTHC
4 ABDALLAH/SIJAM H0ZDM9
5 ABDEL MESSIH/DINA T0SF8N
6 ABHISHEK/PRAMANIK 7SLKXV
7 ABHYANKAR/DHANANJAY 7SM0BV
8 ABOUSALAMA/FEMKE LTTRQC
9 ABRAMOVA/NATALIA 77LCPZ
10 ABRANTES/JOAO KXZC7Q
11 ABRATH/LUC D5J99J
12 ABREO/HECTOR CXDH4G
13 ABREU/ANDREA 242GRC
14 ABREU/MARCELO 2436R7
15 ABREU/VANDA 3HDNQQ
16 ABTS/NATHALIE DSK9TN
17 ABTS/NATHALIE FZ0LN4'''
for line in st.splitlines():
print line.split()[-1]
打印:
0C89JG
F12LFJ
DWPTHC
H0ZDM9
T0SF8N
7SLKXV
7SM0BV
LTTRQC
77LCPZ
KXZC7Q
D5J99J
CXDH4G
242GRC
2436R7
3HDNQQ
DSK9TN
FZ0LN4
或者,如果你只想要'nth',就像这样:
>>> li=[line.split()[-1] for line in st.splitlines()]
>>> li[-1]
'FZ0LN4'
>>> li[-2]
'DSK9TN' # etc etc
或者,如果你真的想要一个正则表达式:
>>> re.findall(r'\s(\S{6})$',st,re.MULTILINE)
['0C89JG', 'F12LFJ', 'DWPTHC', 'H0ZDM9', 'T0SF8N', '7SLKXV', '7SM0BV', 'LTTRQC', '77LCPZ', 'KXZC7Q', 'D5J99J', 'CXDH4G', '242GRC', '2436R7', '3HDNQQ', 'DSK9TN', 'FZ0LN4']
>>> re.findall(r'\s(\S{6})$',st,re.MULTILINE)[-1]
'FZ0LN4'
答案 4 :(得分:0)
使用$
字符表示行和\S
表示非whiteSpace字符
import re
>>> s = s = ''' 1 AA/BB 0C89JG
2 ABANO/ANA VICTORIA F12LFJ
3 ABBOUDLASTNAME/ABBOUDF DWPTHC
4 ABDALLAH/SIJAM H0ZDM9
5 ABDEL MESSIH/DINA T0SF8N
6 ABHISHEK/PRAMANIK 7SLKXV
7 ABHYANKAR/DHANANJAY 7SM0BV
8 ABOUSALAMA/FEMKE LTTRQC
9 ABRAMOVA/NATALIA 77LCPZ
10 ABRANTES/JOAO KXZC7Q
11 ABRATH/LUC D5J99J
12 ABREO/HECTOR CXDH4G
13 ABREU/ANDREA 242GRC
14 ABREU/MARCELO 2436R7
15 ABREU/VANDA 3HDNQQ
16 ABTS/NATHALIE DSK9TN
17 ABTS/NATHALIE FZ0LN4'''
>>> re.findall('\\S{6}$', s, re.MULTILINE)
['0C89JG', 'F12LFJ', 'DWPTHC', 'H0ZDM9', 'T0SF8N', '7SLKXV', '7SM0BV', 'LTTRQC', '77LCPZ', 'KXZC7Q', 'D5J99J', 'CXDH4G', '242GRC', '2436R7', '3HDNQQ', 'DSK9TN', 'FZ0LN4']