我非常擅长使用XML,并且我试图弄清楚如何基于另一棵树查询树。基本上,我有两个XML文件。
第一个文件的开头:
<rootNode>
<splitNode leftSplit="Local-gov,Federal-gov,State-gov" rightSplit="Self- emp-inc,Private,Self-emp-not-inc,?" splitAttr="workclass" splitType="discrete">
<splitNode splitAttr="hours-per-week" splitType="continuous" splitVal="39.53">
<leafNode incomeLevel="<=50K">Leaf</leafNode>
<splitNode leftSplit="10th,Assoc-voc,Some-college,Masters,7th-8th" rightSplit="HS-grad,Bachelors,9th,12th,Assoc-acdm" splitAttr="education" splitType="discrete">
<splitNode leftSplit="United-States" rightSplit="?" splitAttr="native-country" splitType="discrete">
<splitNode splitAttr="capital-gain" splitType="continuous" splitVal="1554.0">
<splitNode splitAttr="education-num" splitType="continuous" splitVal="9.55">
<leafNode incomeLevel="<=50K">Leaf</leafNode>
<splitNode splitAttr="education-num" splitType="continuous" splitVal="10.56">
<splitNode splitAttr="hours-per-week" splitType="continuous" splitVal="41.0">
<leafNode incomeLevel="<=50K">Leaf</leafNode>
<splitNode splitAttr="hours-per-week" splitType="continuous" splitVal="43.5">
<leafNode incomeLevel=">50K">Leaf</leafNode>
<leafNode incomeLevel="<=50K">Leaf</leafNode>
</splitNode>
</splitNode>
<splitNode splitAttr="education-num" splitType="continuous" splitVal="12.5">
<leafNode incomeLevel=">50K">Leaf</leafNode>
<leafNode incomeLevel="<=50K">Leaf</leafNode>
</splitNode>
</splitNode>
</splitNode>
<leafNode incomeLevel=">50K">Leaf</leafNode>
</splitNode>
<leafNode incomeLevel="<=50K">Leaf</leafNode>
</splitNode>
第二个文件的开头:
<People>
<Person age="50" capital-gain="0" capital-loss="0" education="Bachelors" education-num="13" fnlwgt="83311" hours-per-week="13" income-level="<=50K" marital-status="Married-civ-spouse" native-country="United-States" occupation="Exec-managerial" race="White" relationship="Husband" sex="Male" workclass="Self-emp-not-inc"/>
<Person age="53" capital-gain="0" capital-loss="0" education="11th" education-num="7" fnlwgt="234721" hours-per-week="40" income-level="<=50K" marital-status="Married-civ-spouse" native-country="United-States" occupation="Handlers-cleaners" race="Black" relationship="Husband" sex="Male" workclass="Private"/>
<Person age="31" capital-gain="14084" capital-loss="0" education="Masters" education-num="14" fnlwgt="45781" hours-per-week="50" income-level=">50K" marital-status="Never-married" native-country="United-States" occupation="Prof-specialty" race="White" relationship="Not-in-family" sex="Female" workclass="Private"/>
<Person age="30" capital-gain="0" capital-loss="0" education="Bachelors" education-num="13" fnlwgt="141297" hours-per-week="40" income-level=">50K" marital-status="Married-civ-spouse" native-country="India" occupation="Prof-specialty" race="Asian-Pac-Islander" relationship="Husband" sex="Male" workclass="State-gov"/
我要做的是根据第一个文件的分割节点查询第二个文件中每个人的树。因此,我将从决策树的第一个拆分节点(第一个文件)开始,考虑此拆分节点的拆分属性和拆分值,以确定是转到左子节点还是右子节点。我理解这个概念,但我不知道如何实现它。我现在所拥有的只是获取两个文件根目录的代码。
tree = etree.parse(fileName)
root = tree.getroot()
我们非常感谢你们给予的任何帮助!!