我有两个XML文件,如果它相等,我想比较它的一个子部分。 XML文件可能非常不同,但我举了一个例子。 我有以下XML文件:
<EXECUTION_SET>
<RESULT query_id="7" >
<OP>
<PROJ>
<COLUMN col_name="City.ID" col_type="3" col_length="11" />
<OP>
<JOIN>
<OP>
<TABLE name="City" alias="City" access_type="5" total_record_length="67">
<COLUMN col_name="ID" col_type="3" col_length="11" raw_length="4" />
<COLUMN col_name="Name" col_type="254" col_length="35" raw_length="35" />
<COLUMN col_name="CountryCode" col_type="254" col_length="3" raw_length="3" />
<COLUMN col_name="District" col_type="254" col_length="20" raw_length="20" />
<COLUMN col_name="Population" col_type="3" col_length="11" raw_length="4" />
</TABLE>
</OP>
<OP>
<SEL>
<COND>
<COND>
<VALUE><VARIABLE>Country.Population</VARIABLE></VALUE>
</COND>
<BOP><![CDATA[>]]></BOP>
<COND>
<VALUE><CONSTANT>80000000</CONSTANT></VALUE>
</COND>
</COND>
<OP>
<OP>
<TABLE name="Country" alias="Country" access_type="5" total_record_length="261">
<COLUMN col_name="Code" col_type="254" col_length="3" raw_length="3" />
<COLUMN col_name="Name" col_type="254" col_length="52" raw_length="52" />
<COLUMN col_name="Continent" col_type="254" col_length="13" raw_length="1" />
<COLUMN col_name="Region" col_type="254" col_length="26" raw_length="26" />
<COLUMN col_name="SurfaceArea" col_type="4" col_length="10" raw_length="4" />
<COLUMN col_name="IndepYear" col_type="2" col_length="6" raw_length="2" />
<COLUMN col_name="Population" col_type="3" col_length="11" raw_length="4" />
<COLUMN col_name="LifeExpectancy" col_type="4" col_length="3" raw_length="4" />
<COLUMN col_name="GNP" col_type="4" col_length="10" raw_length="4" />
<COLUMN col_name="GNPOld" col_type="4" col_length="10" raw_length="4" />
<COLUMN col_name="LocalName" col_type="254" col_length="45" raw_length="45" />
<COLUMN col_name="GovernmentForm" col_type="254" col_length="45" raw_length="45" />
<COLUMN col_name="HeadOfState" col_type="254" col_length="60" raw_length="60" />
<COLUMN col_name="Capital" col_type="3" col_length="11" raw_length="4" />
<COLUMN col_name="Code2" col_type="254" col_length="2" raw_length="2" />
</TABLE>
</OP>
</OP>
</SEL>
</OP>
</JOIN>
</OP>
</PROJ>
</OP>
</RESULT>
</EXECUTION_SET>
第二个XML文件:
<EXECUTION_SET>
<RESULT query_id="13" >
<OP>
<PROJ>
<COLUMN col_name="Country.Code" col_type="254" col_length="3" />
<OP>
<SEL>
<COND>
<COND>
<VALUE><VARIABLE>Country.Population</VARIABLE></VALUE>
</COND>
<BOP><![CDATA[>]]></BOP>
<COND>
<VALUE><CONSTANT>80000000</CONSTANT></VALUE>
</COND>
</COND>
<OP>
<OP>
<TABLE name="Country" alias="Country" access_type="5" total_record_length="261">
<COLUMN col_name="Code" col_type="254" col_length="3" raw_length="3" />
<COLUMN col_name="Name" col_type="254" col_length="52" raw_length="52" />
<COLUMN col_name="Continent" col_type="254" col_length="13" raw_length="1" />
<COLUMN col_name="Region" col_type="254" col_length="26" raw_length="26" />
<COLUMN col_name="SurfaceArea" col_type="4" col_length="10" raw_length="4" />
<COLUMN col_name="IndepYear" col_type="2" col_length="6" raw_length="2" />
<COLUMN col_name="Population" col_type="3" col_length="11" raw_length="4" />
<COLUMN col_name="LifeExpectancy" col_type="4" col_length="3" raw_length="4" />
<COLUMN col_name="GNP" col_type="4" col_length="10" raw_length="4" />
<COLUMN col_name="GNPOld" col_type="4" col_length="10" raw_length="4" />
<COLUMN col_name="LocalName" col_type="254" col_length="45" raw_length="45" />
<COLUMN col_name="GovernmentForm" col_type="254" col_length="45" raw_length="45" />
<COLUMN col_name="HeadOfState" col_type="254" col_length="60" raw_length="60" />
<COLUMN col_name="Capital" col_type="3" col_length="11" raw_length="4" />
<COLUMN col_name="Code2" col_type="254" col_length="2" raw_length="2" />
</TABLE>
</OP>
</OP>
</SEL>
</OP>
</PROJ>
</OP>
</RESULT>
</EXECUTION_SET>
使用这两个XML文件,如果此子部分相等,我想比较标记下的子部分。为此我在以下两个xpath查询中执行:doc.select_nodes(std::string("//TABLE[@name='Country']/ancestor::SEL/COND").c_str());
并将其写入变量pugi::xpath_node_set partialTree1
和partialTree2
。
所以我得到了两个xml文件的子部分。现在我想比较两个小节。这应该是一种有效的方式。
我有一个解决方案,但我不喜欢它,因为它需要很多时间和空间。目前,我使用traverse
函数抛出子部分,并在两个子部分的字符串中写入全部。然后我比较一下。但必须有更好的方法。
如果有人有想法会很好。
答案 0 :(得分:1)
作为查询的结果,您有一组来自不同文档的节点。每个节点都有一定数量的后代,并且比较两个节点,你必须比较它们的所有后代 - 如果你错过了一个,你不知道树是否相等。
这表明比较节点的字符串转储的方法在时间上是最佳的复杂性。它在空间方面不是最优的 - 在空间方面的最佳算法同时通过两个树并直接比较值。这样的事情应该有效:
template <typename It, typename Pred>
bool rangeEquals(It lb, It le, It rb, It re, Pred pred)
{
It li = lb, ri = rb;
for (; li != le && ri != re; ++li, ++ri)
if (!pred(*li, *ri))
return false;
return li == le && ri == re;
}
bool attrEquals(pugi::xml_attribute la, pugi::xml_attribute ra)
{
return
strcmp(la.name(), ra.name()) == 0 &&
strcmp(la.value(), ra.value()) == 0;
}
bool nodeEquals(pugi::xml_node ln, pugi::xml_node rn)
{
return
ln.type() == rn.type() &&
strcmp(ln.name(), rn.name()) == 0 &&
strcmp(ln.value(), rn.value()) == 0 &&
rangeEquals(ln.attributes_begin(), ln.attributes_end(), rn.attributes_begin(), rn.attributes_end(), attrEquals) &&
rangeEquals(ln.begin(), ln.end(), rn.begin(), rn.end(), nodeEquals);
}