我有一个大的xml文件,其中包含图像注释的详细信息。其样本如下:
<?xml version="1.0" encoding="UTF-8"?>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="ScoreBoard-Vivon" color="#bf5786"/>
<tag name="Perimeter-Vivon" color="#032585"/>
</tags>
<images>
<image file="/var/www/html/beacon.com/resources/videos/ST2_20170812/ST_2_20170812-0005.jpg">
<box top="253" left="166" width="56" height="24">
<label>Perimeter-Vivon</label>
</box>
<box top="255" left="229" width="61" height="21">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="290" width="58" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="361" width="56" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="417" width="63" height="22">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="486" width="63" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="504" left="329" width="51" height="29">
<label>ScoreBoard-Vivon</label>
</box>
</image>
</images>
</dataset>
我希望根据标签名称拆分此文件。这个文件有两个标签,即ScoreBoard和Perimeter。我想为每个标签创建两个不同的xmls。所需的输出如下:
表示ScoreBoard-Vivon.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="ScoreBoard-Vivon" color="#bf5786"/>
</tags>
<images>
<image file="/var/www/html/beacon.com/resources/videos/ST2_20170812/ST_2_20170812-0005.jpg">
<box top="504" left="329" width="51" height="29">
<label>ScoreBoard-Vivon</label>
</box>
</image>
</images>
</dataset>
Perimeter-Vivon.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="Perimeter-Vivon" color="#032585"/>
</tags>
<images>
<image file="/var/www/html/beacon.com/resources/videos/ST2_20170812/ST_2_20170812-0005.jpg">
<box top="253" left="166" width="56" height="24">
<label>Perimeter-Vivon</label>
</box>
<box top="255" left="229" width="61" height="21">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="290" width="58" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="361" width="56" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="417" width="63" height="22">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="486" width="63" height="20">
<label>Perimeter-Vivon</label>
</box>
</image>
</images>
</dataset>
我有350-400个这样的标签。如何将它们分成单个文件。
新例子:
<?xml version="1.0" encoding="UTF-8"?>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="Perimeter-SVT" color="#f9e99c"/>
<tag name="Perimeter-Vivon" color="#032585"/>
<tag name="ScoreBoard-Vivon" color="#bf5786"/>
<tag name="Perimeter-StarSports" color="#12dadd"/>
</tags>
<images>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0011.jpg">
<box top="505" left="327" width="56" height="29">
<label>ScoreBoard-Vivon</label>
</box>
<box top="218" left="387" width="67" height="24">
<label>Perimeter-SVT</label>
</box>
</image>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0005.jpg">
<box top="254" left="159" width="64" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="255" left="225" width="61" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="285" width="63" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="253" left="357" width="58" height="24">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="424" width="56" height="25">
<label>Perimeter-Vivon</label>
</box>
<box top="256" left="484" width="65" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="507" left="326" width="58" height="26">
<label>ScoreBoard-Vivon</label>
</box>
</image>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0009.jpg">
<box top="249" left="400" width="59" height="29">
<label>Perimeter-StarSports</label>
</box>
</image>
</images>
</dataset>
答案 0 :(得分:1)
一种方法是获取原始XML,确定正在使用的<tags>
,然后复制XML并删除所有不匹配的标记:
import xml.etree.ElementTree as ET
import copy
img_xml = """<?xml version="1.0" encoding="UTF-8"?>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="ScoreBoard-Vivon" color="#bf5786"/>
<tag name="Perimeter-Vivon" color="#032585"/>
</tags>
<images>
<image file="/var/www/html/beacon.com/resources/videos/ST2_20170812/ST_2_20170812-0005.jpg">
<box top="253" left="166" width="56" height="24">
<label>Perimeter-Vivon</label>
</box>
<box top="255" left="229" width="61" height="21">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="290" width="58" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="361" width="56" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="417" width="63" height="22">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="486" width="63" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="504" left="329" width="51" height="29">
<label>ScoreBoard-Vivon</label>
</box>
</image>
</images>
</dataset>
"""
root = ET.fromstring(img_xml)
tag_names = [tag.attrib['name'] for tag in root.find('tags')]
for tag_name in tag_names:
root_copy = copy.deepcopy(root)
# First remove unwanted tag
for tag in root_copy.find('tags'):
if tag.attrib['name'] != tag_name:
tag.clear()
# Now remove unwanted box
for box in root_copy.findall("./images/image/box"):
if box[0].text != tag_name:
box.clear()
ET.ElementTree(root_copy).write('{}.xml'.format(tag_name))
为您提供两个输出XML文件:
<强>周长-Vivon.xml 强>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag /><tag color="#032585" name="Perimeter-Vivon" />
</tags>
<images>
<image file="/var/www/html/beacon.com/resources/videos/ST2_20170812/ST_2_20170812-0005.jpg">
<box height="24" left="166" top="253" width="56">
<label>Perimeter-Vivon</label>
</box>
<box height="21" left="229" top="255" width="61">
<label>Perimeter-Vivon</label>
</box>
<box height="23" left="290" top="254" width="58">
<label>Perimeter-Vivon</label>
</box>
<box height="20" left="361" top="254" width="56">
<label>Perimeter-Vivon</label>
</box>
<box height="22" left="417" top="254" width="63">
<label>Perimeter-Vivon</label>
</box>
<box height="20" left="486" top="254" width="63">
<label>Perimeter-Vivon</label>
</box>
<box /></image>
</images>
</dataset>
<强>记分牌-Vivon.xml 强>
<dataset>
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag color="#bf5786" name="ScoreBoard-Vivon" />
<tag /></tags>
<images>
<image file="/var/www/html/beacon.com/resources/videos/ST2_20170812/ST_2_20170812-0005.jpg">
<box /><box /><box /><box /><box /><box /><box height="29" left="329" top="504" width="51">
<label>ScoreBoard-Vivon</label>
</box>
</image>
</images>
</dataset>
答案 1 :(得分:1)
以下(XSLT 2.0)样式表:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="//dataset/tags">
<xsl:for-each select="./tag">
<xsl:variable name="tagName" select="@name" />
<xsl:result-document method="xml" href="{$tagName}.xml">
<dataset>
<xsl:copy-of select="/dataset/name"/>
<xsl:copy-of select="/dataset/comment"/>
<tags>
<xsl:copy-of select="/dataset/tags/tag[./@name = $tagName]"/>
</tags>
<images>
<xsl:for-each select="/dataset/images/image[./box/label/text() = $tagName]">
<image>
<xsl:copy-of select="./@file"/>
<xsl:copy-of select="./box[./label[./text() = $tagName]]"/>
</image>
</xsl:for-each>
</images>
</dataset>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
当应用于您的输入时,会产生以下结果:
周长-SVT.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataset xmlns:xs="http://www.w3.org/2001/XMLSchema">
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="Perimeter-SVT" color="#f9e99c"/>
</tags>
<images>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0011.jpg">
<box top="218" left="387" width="67" height="24">
<label>Perimeter-SVT</label>
</box>
</image>
</images>
</dataset>
周长-Vivon.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataset xmlns:xs="http://www.w3.org/2001/XMLSchema">
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="Perimeter-Vivon" color="#032585"/>
</tags>
<images>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0005.jpg">
<box top="254" left="159" width="64" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="255" left="225" width="61" height="20">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="285" width="63" height="23">
<label>Perimeter-Vivon</label>
</box>
<box top="253" left="357" width="58" height="24">
<label>Perimeter-Vivon</label>
</box>
<box top="254" left="424" width="56" height="25">
<label>Perimeter-Vivon</label>
</box>
<box top="256" left="484" width="65" height="23">
<label>Perimeter-Vivon</label>
</box>
</image>
</images>
</dataset>
记分牌-Vivon.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataset xmlns:xs="http://www.w3.org/2001/XMLSchema">
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="ScoreBoard-Vivon" color="#bf5786"/>
</tags>
<images>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0011.jpg">
<box top="505" left="327" width="56" height="29">
<label>ScoreBoard-Vivon</label>
</box>
</image>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0005.jpg">
<box top="507" left="326" width="58" height="26">
<label>ScoreBoard-Vivon</label>
</box>
</image>
</images>
</dataset>
周长-StarSports.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataset xmlns:xs="http://www.w3.org/2001/XMLSchema">
<name>dataset containing bounding box labels on images</name>
<comment>created by BBTag</comment>
<tags>
<tag name="Perimeter-StarSports" color="#12dadd"/>
</tags>
<images>
<image file="/var/www/html/tamsports.com/resources/videos/STAR_SPORTS_2_20170812/STAR_SPORTS_2_20170812-0009.jpg">
<box top="249" left="400" width="59" height="29">
<label>Perimeter-StarSports</label>
</box>
</image>
</images>
</dataset>