这是我的字符串:
propTypes
我想捕捉除0%以外的一组中的所有内容。 到目前为止,我有以下正则表达式:
-rwxrwx--- Administrators/unknown 563092 0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/work.log
-rwxrwx--- kandep2/Domain Users 563092 0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log
这适用于第一行,但在第二行失败,因为第二行用户名中有一个数字。
我应该如何修改我的正则表达式以获得这样的输出:
组#1
([rwexXst-]+) ([^1-9]+) +(\d+)+.+? +(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (.*)
第2组
-rwxrwx---
Administrators/unknown
563092
2018-05-29 02:16:49
E:/program files/bak fil/sql server (mssqlserver)/var/work.log
答案 0 :(得分:2)
我建议named groups。另外,尝试概括你的正则表达式,这样你就不必计算精确的空格。
尝试:
- name: allow-http-fw
type: compute.v1.firewall
properties:
allowed:
- IPProtocol: TCP
ports: 80
sourceRanges: [ 0.0.0.0/0 ]
或者甚至更简单:
pattern = re.compile(r'''(?P<rw>[rwexXst\-]+)\s+
(?P<dir>\w+(?:\s+\w+)?\/\w+(?:\s+\w+)?)\s+
(?P<nums>\d+)(?:.+\%)?\s+
(?P<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+
(?P<msg>.*)$''', flags=re.M|re.X)
test_text = '''
-rwxrwx--- Administrators/unknown 563092 0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/work.log
-rwxrwx--- kandep2/Domain Users 563092 0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log
'''
for i in re.finditer(test_text):
match = re.search(i)
if match:
print(match.groupdict())
print(match.groups())
应该给你:
for match in pattern.finditer(test_text):
print(match.groupdict())
print(match.groups())
答案 1 :(得分:1)
您可以使用带有命名捕获组的详细表达式,如下所示:
(?P<rights>-[-rwx]+)\s+ # rights -> one of -,r,w,x
(?P<group>(?:(?!\s{2,}).)+)\s+ # anything not two consecutive whitespaces
(?P<uid>\d+)\s+ # only digits
(?:[\d%]+)\s+ # digits and %
(?P<date>[- :\d]+)\s+ # the date
(?P<filename>.+) # and the filename
<小时/> 在
Python
中,这是:
import re
data = """
-rwxrwx--- Administrators/unknown 563092 0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/work.log
-rwxrwx--- kandep2/Domain Users 563092 0% 2018-05-29 02:16:49 E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log
"""
rx = re.compile(r'''
(?P<rights>-[-rwx]+)\s+
(?P<group>(?:(?!\s{2,}).)+)\s+
(?P<uid>\d+)\s+
(?:[\d%]+)\s+
(?P<date>[- :\d]+)\s+
(?P<filename>.+)''', re.M | re.X)
results = [m.groupdict() for m in rx.finditer(data)]
print(results)
这将产生
[
{'rights': '-rwxrwx---', 'group': 'Administrators/unknown', 'uid': '563092', 'date': '2018-05-29 02:16:49', 'filename': 'E:/program files/bak fil/sql server (mssqlserver)/var/work.log'},
{'rights': '-rwxrwx---', 'group': 'kandep2/Domain Users', 'uid': '563092', 'date': '2018-05-29 02:16:49', 'filename': 'E:/program files/bak fil/sql server (mssqlserver)/var/dummy.log'}
]
<小时/> 这个想法是捕获任何感兴趣的东西,并匹配(或使用非捕获组)“垃圾”。见a demo for the expression on regex101.com。
答案 2 :(得分:1)
没有正则表达式的解决方案。
from pprint import pprint
input_row = '-rwxrwx--- kandep2/Domain %Users 563092 0%% 2018-05-29 02:16:49 E:/program $files/bak fil/sql server (mssqlserver)/var/dummy.log'
def parse(value):
v = value.index(':/')
row = input_row[:v-1].strip()
row = row.split()
return {
'rw': row.pop(0),
'date': '%s %s' % (row.pop(-2), row.pop(-1)),
'value': row.pop(-1),
'nums': row.pop(-1),
'msg': input_row[v-1:],
'dir': ' '.join(row),
}
pprint(parse(input_row))
<强>结果:强>
{'date': '2018-05-29 02:16:49',
'dir': 'kandep2/Domain %Users',
'msg': 'E:/program $files/bak fil/sql server (mssqlserver)/var/dummy.log',
'nums': '563092',
'rw': '-rwxrwx---',
'value': '0%%'}