Python初学者 - 使用可选字符构建正则表达式

时间:2014-03-21 14:31:25

标签: python regex

我尝试使用Python构建一个必须匹配的正则表达式:

STRING

STRING STRING

STRING(STRING)STRING(STRING)

STRING(STRING)STRING(STRING)STRING(STRING)STRING

我尝试使用metacharacter optionnal完成这项工作?但是对于第二种模式STRING STRING它不起作用:我只有第一个字符串后面的第一个字符

\w+\s+\w+? 

给出

  

STRING S

但应该给出

  

STRING STRING

并匹配

  

STRING
  STRING STRING

以下是完整代码:

import csv 
import re 
import sys 

fname = sys.argv[1] 

r = r'(\w+) access = (\w+)\s+Vol ID = (\w+)\s+Snap ID = (\w+)\s+Inode = (\w+)\s+IP = ((\d|\.)+)\s+UID = (\w+)\s+Full Path = (\S+)\s+Handle ID: (\S+)\s+Operation ID: (\S+)\s+Process ID: (\d+)\s+Image File Name: (\w+\s+\w+\s+\w+)\s+Primary User Name: (\S+)\s+Primary Domain: (\S+)\s+Primary Logon ID: (.....\s+......)\s+Client User Name: (\S+)\s+Client Domain: (\S+)\s+Client Logon ID: (\S+)'


regex = re.compile(r)

out = csv.writer(sys.stdout) 

f_hdl = open(fname, 'r')
csv_rdr = csv.reader(f_hdl)
header = True

for row in csv_rdr:
    #print row
    if header:
        header = False
    else:
        field = row[-1]

        res = re.search(regex, field)

        if res:
            audit_status = row[3]
            device = row[7]
            date_time = row[0]
            event_id = row[2]
            user = row[6]
            access_source = res.group(1) 
            access_type = res.group(2) 
            volume = res.group(3)
            snap = res.group(4)
            inode = res.group(5)
            ip = res.group(6)
            uid = res.group(8)
            path = res.group(9)
            handle_id = res.group(10)
            operation_id = res.group(11)
            process_id = res.group(12)
            image_file_name = res.group(13)
            primary_user_name = res.group(14)
            primary_domain = res.group(15)
            primary_logon_id = res.group(16)
            client_user_name = res.group(17)
            client_domain = res.group(18)
            client_logon_id = res.group(19)

            print audit_status, device, date_time, event_id, user, access_source, access_type, volume, snap, inode, ip, uid, path
            out.writerow(
                    [audit_status, device, date_time, event_id, user, access_source, access_type, volume, snap, inode, ip, uid, path, handle_id, operation_id, process_id, image_file_name, primary_user_name, primary_domain, primary_logon_id, client_user_name, client_domain, client_logon_id]
                )
        else:
            print 'NOMATCH'

有什么建议吗?

2 个答案:

答案 0 :(得分:2)

  

有些人在面对问题时会思考   “我知道,我会使用正则表达式。”现在他们有两个问题。

如果是使用空格分隔和括号引用的csv文件,请使用

csv.reader(csvfile, delimiter=' ', quotechar='(')

如果它甚至更简单,请在字符串上使用split方法并展开它以使用空字符串填充所有字段:

fields = field.split(' ')
fields = [i or j for i, j in map(None, fields, ('',) * 7)]

答案 1 :(得分:1)

试试这个正则表达式字符串:

r = '(\\w+) access = (\\w+)\\s+Vol ID = (\\w+)\\s+Snap ID = (\\w+)\\s+Inode = (\\w+)\\s+IP = ((\\d|\\.)+)\\s+UID = (\\w+)\\s+Full Path = (\\S+)\\s+Handle ID: (\\S+)\\s+Operation ID: (\\S+)\\s+Process ID: (\\d+)\\s+Image File Name: (\\w+\\s+\\w+\\s+\\w+)\\s+Primary User Name: (\\S+)\\s+Primary Domain: (\\S+)\\s+Primary Logon ID: (.....\\s+......)\\s+Client User Name: (\\S+)\\s+Client Domain: (\\S+)\\s+Client Logon ID: (\\S+)\\s+Accesses: (.*)'