我有一段文字,我必须解析用户名和哈希值。现在我正在使用两个正则表达式。我可以只使用一个多行正则表达式吗?
#!/usr/bin/env python
import re
test_str = """
Hello, UserName.
Please read this looooooooooooooooong text. hash
Now, write down this hash: fdaf9399jef9qw0j.
Then keep reading this loooooooooong text.
Hello, UserName2.
Please read this looooooooooooooooong text. hash
Now, write down this hash: gtwnhton340gjr2g.
Then keep reading this loooooooooong text.
"""
logins = re.findall('Hello, (?P<login>.+).',test_str)
hashes = re.findall('hash: (?P<hash>.+).',test_str)
答案 0 :(得分:5)
试试这个:
re.findall(r'Hello, (?P<login>[^.]+)\..+?hash: (?P<hash>[^.]+)', test_str, re.S)
答案 1 :(得分:2)
name_hash_pair = re.findall('Hello, ([^.]+).*?hash: ([^.]+)', test_str, re.DOTALL)
#gives [('UserName', 'fdaf9399jef9qw0j'), ('UserName2', 'gtwnhton340gjr2g')]
答案 2 :(得分:2)
一个简单的pyparsing
版本:
from pyparsing import *
username = Word(alphas,alphanums+"_")
hash = Word(alphanums)
patt = ("Hello," + username("username") + '.' +
SkipTo("write down this hash:", include=True) +
hash("hash"))
for tokens,start,end in patt.scanString(test_str):
print tokens.hash, '->', tokens.username
# or to build a dict
hashNameLookup = dict((t.hash, t.username)
for t,s,e in patt.scanString(test_str))
打印:
fdaf9399jef9qw0j -> UserName
gtwnhton340gjr2g -> UserName2