我有一个文件,其中每个块由!
分隔。即
!
vserver XXXX
virtual XX.xx.XX.XX tcp 389
owner LDAP
serverfarm XXX
idle 5
persistent rebalance
inservice
!
我想获得包含vserver信息的每个部分。我试图在python中使用正则表达式,但我在处理换行符时遇到问题
我试过这样的事情:
pattern = r"!\n vserver \S+\n "
答案 0 :(得分:5)
你需要告诉Python你正在使用multi-line正则表达式dot characters can match new-lines:
>>> m = re.search('^!.*^!', text, re.MULTILINE | re.DOTALL)
>>> m.group(0)
'!\n vserver XXXX\n virtual XX.xx.XX.XX tcp 389\n owner LDAP\n serverfarm XXX\n idle 5\n persistent rebalance\n inservice\n!'
如果您想获取虚拟服务器的名称:
>>> m = re.search('^!.*vserver\s+(\w+).*^!', text, re.MULTILINE | re.DOTALL)
>>> m.group(0)
'!\n vserver XXXX\n virtual XX.xx.XX.XX tcp 389\n owner LDAP\n serverfarm XXX\n idle 5\n persistent rebalance\n inservice\n!'
>>> m.group(1)
'XXXX'
答案 1 :(得分:1)
尝试
stri = " the output of open(myfilename,'r').read() "
import re
pattern = r"^!\n vserver \S+\n[^!]+^!"
re.findall(pattern,stri,flags=re.M)
正则表达式:
^!\n -> match a solitary '!' on its own line followed by newline
vserver \S+\n -> starting with vserver \S+\n
[^!]+ -> match the rest of the block, up to..
^! -> another solitary '!' on its own line.
根据您要提取的特定信息,可以改进正则表达式。
例如,要在vserver
之后提取文本,我可以添加捕获括号:
pattern = r"^!\n vserver (\S+)\n[^!]+^!"
然后:
re.findall(pattern,stri,flags=re.M) # returns ['XXXX']
答案 2 :(得分:1)
这样做的好处是不能一次读取整个文件
from itertools import groupby
with open("data.txt") as infile:
for block in (j for i,j in groupby(t,'!'.__ne__) if i):
block = list(block)
if not block[0].startswith("vserver "):
continue
...
答案 3 :(得分:0)
teststr = """
sdafsad
!
vserver XXXX
virtual XX.xx.XX.XX tcp 389
owner LDAP
serverfarm XXX
idle 5
persistent rebalance
inservice
!
dsfdasfas
"""
import re
m = re.search("!\n[^!]*vserver[^!]*!", teststr)
print m.group(0)
答案 4 :(得分:0)
我不是正则表达式的忠实粉丝,列表组件怎么样?
vserver_blocks = [block for block in data.split("!") if "vserver" in block]