我尝试使用Python构建一个必须匹配的正则表达式:
STRING
STRING STRING
STRING(STRING)STRING(STRING)
STRING(STRING)STRING(STRING)STRING(STRING)STRING
我尝试使用metacharacter optionnal完成这项工作?但是对于第二种模式STRING STRING它不起作用:我只有第一个字符串后面的第一个字符
\w+\s+\w+?
给出
STRING S
但应该给出
STRING STRING
并匹配
STRING
STRING STRING
以下是完整代码:
import csv
import re
import sys
fname = sys.argv[1]
r = r'(\w+) access = (\w+)\s+Vol ID = (\w+)\s+Snap ID = (\w+)\s+Inode = (\w+)\s+IP = ((\d|\.)+)\s+UID = (\w+)\s+Full Path = (\S+)\s+Handle ID: (\S+)\s+Operation ID: (\S+)\s+Process ID: (\d+)\s+Image File Name: (\w+\s+\w+\s+\w+)\s+Primary User Name: (\S+)\s+Primary Domain: (\S+)\s+Primary Logon ID: (.....\s+......)\s+Client User Name: (\S+)\s+Client Domain: (\S+)\s+Client Logon ID: (\S+)'
regex = re.compile(r)
out = csv.writer(sys.stdout)
f_hdl = open(fname, 'r')
csv_rdr = csv.reader(f_hdl)
header = True
for row in csv_rdr:
#print row
if header:
header = False
else:
field = row[-1]
res = re.search(regex, field)
if res:
audit_status = row[3]
device = row[7]
date_time = row[0]
event_id = row[2]
user = row[6]
access_source = res.group(1)
access_type = res.group(2)
volume = res.group(3)
snap = res.group(4)
inode = res.group(5)
ip = res.group(6)
uid = res.group(8)
path = res.group(9)
handle_id = res.group(10)
operation_id = res.group(11)
process_id = res.group(12)
image_file_name = res.group(13)
primary_user_name = res.group(14)
primary_domain = res.group(15)
primary_logon_id = res.group(16)
client_user_name = res.group(17)
client_domain = res.group(18)
client_logon_id = res.group(19)
print audit_status, device, date_time, event_id, user, access_source, access_type, volume, snap, inode, ip, uid, path
out.writerow(
[audit_status, device, date_time, event_id, user, access_source, access_type, volume, snap, inode, ip, uid, path, handle_id, operation_id, process_id, image_file_name, primary_user_name, primary_domain, primary_logon_id, client_user_name, client_domain, client_logon_id]
)
else:
print 'NOMATCH'
有什么建议吗?
答案 0 :(得分:2)
有些人在面对问题时会思考 “我知道,我会使用正则表达式。”现在他们有两个问题。
如果是使用空格分隔和括号引用的csv文件,请使用
csv.reader(csvfile, delimiter=' ', quotechar='(')
如果它甚至更简单,请在字符串上使用split方法并展开它以使用空字符串填充所有字段:
fields = field.split(' ')
fields = [i or j for i, j in map(None, fields, ('',) * 7)]
答案 1 :(得分:1)
试试这个正则表达式字符串:
r = '(\\w+) access = (\\w+)\\s+Vol ID = (\\w+)\\s+Snap ID = (\\w+)\\s+Inode = (\\w+)\\s+IP = ((\\d|\\.)+)\\s+UID = (\\w+)\\s+Full Path = (\\S+)\\s+Handle ID: (\\S+)\\s+Operation ID: (\\S+)\\s+Process ID: (\\d+)\\s+Image File Name: (\\w+\\s+\\w+\\s+\\w+)\\s+Primary User Name: (\\S+)\\s+Primary Domain: (\\S+)\\s+Primary Logon ID: (.....\\s+......)\\s+Client User Name: (\\S+)\\s+Client Domain: (\\S+)\\s+Client Logon ID: (\\S+)\\s+Accesses: (.*)'