我有一个字符串列表。如果列表中的任何单词与文档中的一行匹配,
我想获得匹配单词和行中将出现的数字作为输出,主要是在该匹配单词之后。单词和数字大多以space
或:
文档示例:
Expedien: 1-21-212-16-26
我的列表:
my_list = ['Reference', 'Ref.', 'tramite', 'Expedien']
匹配字符串的行内数字可以用-
分隔,也可以不分隔。
示例:1-21-22-45
或RE9833
在这种情况下,如果在行内找到匹配的单词,RE9833
应该全部出现(不仅是数字)。
如何为此在python中编写正则表达式。
答案 0 :(得分:0)
输入文件:
$cat input_file
Expedien: 1-21-212-16-26 #other garbage
Reference RE9833 #tralala
abc
123
456
Ref.: UV1234
tramite 1234567
Ref.:
示例:
import re
my_list = ['Reference', 'Ref.', 'tramite', 'Expedien']
#open the file as input
with open('input_file','r') as infile:
#create an empty dict to store the pairs
#that we will extract from the file
res = dict()
#for each input line
for line in infile:
#the only place we will use regex in this code
#we split the input strings in a list of strings using
#as separator : if present followed by some spaces
elems = re.split('(?::)?\s+', line)
#we test that we have at least 2 elements
#if not we continue with the following line
if len(elems) >= 2 :
contains = False
#tmp will store all the keys identfied
tmp = ''
#we go through all the strings present in this list of strings
for elem in elems:
#when we enter this if we have already found the key and we have the value
#at this iteration
if contains:
#we store it in the dict
#reset the check and leave this loop
res.update({tmp : elem})
contains = False
break
#we check if the elem is in my_list
if elem in my_list:
#if this is the case
#we set contains to true and we save the key in tmp
contains = True
tmp = elem
print(res)
输出:
python find_list.py
{'tramite': '1234567', 'Reference': 'RE9833', 'Expedien': '1-21-212-16-26', 'Ref.': ''}
正则表达式演示:https://regex101.com/r/kSmLzW/3/