我正在尝试在python中执行以下操作。
我有一个包含以下内容的文件......
<VirtualHost>
ServerName blah.com
DocumentRoot /var/www/blah.com
</Virtualhost>
<VirtualHost>
ServerName blah2.com
DocumentRoot /var/www/blah2.com
</Virtualhost>
... etc
我想把这些虚拟主机容器中的每一个放在一个单独的文件中(或者我可以在那里工作)......
我能够在字符串之间获取数据但不包括它们。所以输出将是......
<VirtualHost>
ServerName blah2.com
DocumentRoot /var/www/blah2.com
</Virtualhost>
...iterated through each container and not...
ServerName blah2.com
DocumentRoot /var/www/blah2.com
如果这是可以轻松完成的事情,请告诉我。谢谢!
答案 0 :(得分:0)
findall正则表达式可能有效:
import re
d = """
<VirtualHost>
ServerName blah.com
DocumentRoot /var/www/blah.com
</Virtualhost>
<VirtualHost>
ServerName blah2.com
DocumentRoot /var/www/blah2.com
</Virtualhost>
"""
matches = re.findall(r'<VirtualHost>(.*?)</Virtualhost>', d, re.I|re.DOTALL)
#['\n ServerName blah.com\n DocumentRoot /var/www/blah.com\n',
# '\n ServerName blah2.com\n DocumentRoot /var/www/blah2.com\n']
或包含<VirtualHost>
部分:
matches = re.findall(r'<VirtualHost>.*?</Virtualhost>', d, re.I|re.DOTALL)
#['<VirtualHost>\n ServerName blah.com\n DocumentRoot /var/www/blah.com\n</Virtualhost>',
# '<VirtualHost>\n ServerName blah2.com\n DocumentRoot /var/www/blah2.com\n</Virtualhost>']
答案 1 :(得分:0)
假设您的输入数据是XML格式,您可以使用minidom(由@Aesthete建议)或ElementTree:
import xml.dom.minidom as MD
import xml.etree.ElementTree as ET
input = """
<Document>
<VirtualHost>
ServerName blah.com
DocumentRoot /var/www/blah.com
</VirtualHost>
<VirtualHost>
ServerName blah2.com
DocumentRoot /var/www/blah2.com
</VirtualHost>
</Document>"""
domDoc = MD.parseString(input)
etreeDoc = ET.fromstring(input)
# list for Python 3.x
miniDomOutput = list(map(lambda f: f.toxml(), domDoc.getElementsByTagName('VirtualHost')))
elementTreeOutput = list(map(lambda f: ET.tostring(f), etreeDoc.findall('VirtualHost')))
print(miniDomOutput)
print(elementTreeOutput)
输出:
#['<VirtualHost>\n ServerName blah.com\n DocumentRoot /var/www/blah.com\n </VirtualHost>', '<VirtualHost>\n ServerName blah2.com\n DocumentRoot /var/www/blah2.com\n </VirtualHost>']
#[b'<VirtualHost>\n ServerName blah.com\n DocumentRoot /var/www/blah.com\n </VirtualHost>\n ', b'<VirtualHost>\n ServerName blah2.com\n DocumentRoot /var/www/blah2.com\n </VirtualHost>\n']