需要Python的正则表达式

时间:2018-06-02 16:03:51

标签: python regex

这是我的字符串:

propTypes

我想捕捉除0%以外的一组中的所有内容。 到目前为止,我有以下正则表达式:

-rwxrwx--- Administrators/unknown      563092   0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/work.log

-rwxrwx--- kandep2/Domain Users      563092   0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log

这适用于第一行,但在第二行失败,因为第二行用户名中有一个数字。

我应该如何修改我的正则表达式以获得这样的输出:

组#1

([rwexXst-]+) ([^1-9]+) +(\d+)+.+? +(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (.*)

第2组

-rwxrwx---
Administrators/unknown
563092
2018-05-29 02:16:49
E:/program files/bak fil/sql server (mssqlserver)/var/work.log

3 个答案:

答案 0 :(得分:2)

我建议named groups。另外,尝试概括你的正则表达式,这样你就不必计算精确的空格。

尝试:

- name: allow-http-fw
  type: compute.v1.firewall
  properties:
    allowed:
      - IPProtocol: TCP
        ports: 80
    sourceRanges: [ 0.0.0.0/0 ]

或者甚至更简单:

pattern = re.compile(r'''(?P<rw>[rwexXst\-]+)\s+
(?P<dir>\w+(?:\s+\w+)?\/\w+(?:\s+\w+)?)\s+
(?P<nums>\d+)(?:.+\%)?\s+
(?P<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+
(?P<msg>.*)$''', flags=re.M|re.X)

test_text = '''
-rwxrwx--- Administrators/unknown      563092   0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/work.log

-rwxrwx--- kandep2/Domain Users      563092   0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log
'''

for i in re.finditer(test_text):
    match = re.search(i)
    if match:
        print(match.groupdict())
        print(match.groups())

应该给你:

for match in pattern.finditer(test_text):
    print(match.groupdict())
    print(match.groups())

答案 1 :(得分:1)

您可以使用带有命名捕获组的详细表达式,如下所示:

(?P<rights>-[-rwx]+)\s+         # rights -> one of -,r,w,x
(?P<group>(?:(?!\s{2,}).)+)\s+  # anything not two consecutive whitespaces
(?P<uid>\d+)\s+                 # only digits
(?:[\d%]+)\s+                   # digits and %
(?P<date>[- :\d]+)\s+           # the date
(?P<filename>.+)                # and the filename

<小时/> 在Python中,这是:

import re

data = """
-rwxrwx--- Administrators/unknown      563092   0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/work.log

-rwxrwx--- kandep2/Domain Users      563092   0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log
"""

rx = re.compile(r'''
    (?P<rights>-[-rwx]+)\s+
    (?P<group>(?:(?!\s{2,}).)+)\s+
    (?P<uid>\d+)\s+
    (?:[\d%]+)\s+
    (?P<date>[- :\d]+)\s+
    (?P<filename>.+)''', re.M | re.X)

results = [m.groupdict() for m in rx.finditer(data)]
print(results)

这将产生

[
    {'rights': '-rwxrwx---', 'group': 'Administrators/unknown', 'uid': '563092', 'date': '2018-05-29 02:16:49', 'filename': 'E:/program files/bak fil/sql server (mssqlserver)/var/work.log'}, 
    {'rights': '-rwxrwx---', 'group': 'kandep2/Domain Users', 'uid': '563092', 'date': '2018-05-29 02:16:49', 'filename': 'E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log'}
]

<小时/> 这个想法是捕获任何感兴趣的东西,并匹配(或使用非捕获组)“垃圾”。见a demo for the expression on regex101.com

答案 2 :(得分:1)

没有正则表达式的解决方案。

from pprint import pprint

input_row = '-rwxrwx--- kandep2/Domain %Users      563092   0%% 2018-05-29 02:16:49 E:/program $files/bak fil/sql server (mssqlserver)/var/dummy.log'


def parse(value):
    v = value.index(':/')

    row = input_row[:v-1].strip()
    row = row.split()

    return {
        'rw': row.pop(0),
        'date': '%s %s' % (row.pop(-2), row.pop(-1)),
        'value': row.pop(-1),
        'nums': row.pop(-1),
        'msg': input_row[v-1:],
        'dir': ' '.join(row),
    }


pprint(parse(input_row))

<强>结果:

{'date': '2018-05-29 02:16:49',
 'dir': 'kandep2/Domain %Users',
 'msg': 'E:/program $files/bak fil/sql server (mssqlserver)/var/dummy.log',
 'nums': '563092',
 'rw': '-rwxrwx---',
 'value': '0%%'}