拉出两个字符串之间的内容,包括字符串

时间:2012-08-19 01:20:36

标签: python regex

我正在尝试在python中执行以下操作。

我有一个包含以下内容的文件......

<VirtualHost>
  ServerName blah.com
  DocumentRoot /var/www/blah.com
</Virtualhost>

<VirtualHost>
  ServerName blah2.com
  DocumentRoot /var/www/blah2.com
</Virtualhost>

... etc

我想把这些虚拟主机容器中的每一个放在一个单独的文件中(或者我可以在那里工作)......

我能够在字符串之间获取数据但不包括它们。所以输出将是......

<VirtualHost>
  ServerName blah2.com
  DocumentRoot /var/www/blah2.com
</Virtualhost>

...iterated through each container and not...
ServerName blah2.com
DocumentRoot /var/www/blah2.com

如果这是可以轻松完成的事情,请告诉我。谢谢!

2 个答案:

答案 0 :(得分:0)

findall正则表达式可能有效:

import re

d = """
<VirtualHost>
  ServerName blah.com
  DocumentRoot /var/www/blah.com
</Virtualhost>
<VirtualHost>
  ServerName blah2.com
  DocumentRoot /var/www/blah2.com
</Virtualhost>
"""

matches = re.findall(r'<VirtualHost>(.*?)</Virtualhost>', d, re.I|re.DOTALL)

#['\n  ServerName blah.com\n  DocumentRoot /var/www/blah.com\n',
# '\n  ServerName blah2.com\n  DocumentRoot /var/www/blah2.com\n']

或包含<VirtualHost>部分:

matches = re.findall(r'<VirtualHost>.*?</Virtualhost>', d, re.I|re.DOTALL)

#['<VirtualHost>\n  ServerName blah.com\n  DocumentRoot /var/www/blah.com\n</Virtualhost>',
# '<VirtualHost>\n  ServerName blah2.com\n  DocumentRoot /var/www/blah2.com\n</Virtualhost>']

答案 1 :(得分:0)

假设您的输入数据是XML格式,您可以使用minidom(由@Aesthete建议)或ElementTree

import xml.dom.minidom as MD
import xml.etree.ElementTree as ET

input = """
<Document>
    <VirtualHost>
        ServerName blah.com
        DocumentRoot /var/www/blah.com
    </VirtualHost>
    <VirtualHost>
        ServerName blah2.com
        DocumentRoot /var/www/blah2.com
    </VirtualHost>
</Document>"""

domDoc = MD.parseString(input)
etreeDoc = ET.fromstring(input)

# list for Python 3.x
miniDomOutput = list(map(lambda f: f.toxml(), domDoc.getElementsByTagName('VirtualHost')))
elementTreeOutput = list(map(lambda f: ET.tostring(f), etreeDoc.findall('VirtualHost')))

print(miniDomOutput)
print(elementTreeOutput)

输出:

#['<VirtualHost>\n        ServerName blah.com\n        DocumentRoot /var/www/blah.com\n    </VirtualHost>', '<VirtualHost>\n        ServerName blah2.com\n        DocumentRoot /var/www/blah2.com\n    </VirtualHost>']
#[b'<VirtualHost>\n        ServerName blah.com\n        DocumentRoot /var/www/blah.com\n    </VirtualHost>\n    ', b'<VirtualHost>\n        ServerName blah2.com\n        DocumentRoot /var/www/blah2.com\n    </VirtualHost>\n']