我正在使用nodejs和RegExp解析XML文件,但是我找不到从父级提取所有子级的方法,例如,我需要父级PARENT1的所有FormalName =“(。+)”
<TopicSet FormalName="PARENT1">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>
我尝试过这个:
<TopicSet FormalName="PARENT1">(?:(?:\s|\S)*?)TopicType FormalName="(.+)"(?:(?:\s|\S)*?)<\/TopicSet>
但是它仅返回PARENT1的第一个匹配项(Child1),而不返回Child1,Child2和Child3
答案 0 :(得分:4)
使用正则表达式解析xml是not advisable。
您可以使用DOMParser而不是使用正则表达式,例如使用querySelectorAll来获取PARENT1中FormalName
的值:
使用jsdom
的示例
let xml = `<TopicSet FormalName="PARENT1">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>`;
let parser = new DOMParser();
let doc = parser.parseFromString(xml, "text/xml");
let res = doc.querySelectorAll("TopicSet[FormalName='PARENT1'] Topic TopicType");
res.forEach(e => console.log(e.getAttribute("FormalName")));
答案 1 :(得分:0)
使用正则表达式执行此操作可能不是最好的主意。但是,如果需要,您可能需要创建三个捕获组,并以父级打开/关闭标签为左/右边界,然后在它们之间滑动所有内容:
(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)
如果这不是您想要的表达式,则可以在regex101.com中修改/更改表达式。
您还可以在jex.im中可视化您的表达式:
const regex = /(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)/mg;
const str = `<TopicSet FormalName="PARENT1">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>`;
const subst = `$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
如果您还希望打印父标签,则只需将其替换为$1$2$3
而不是$2
,在这里我们将其添加为易于调用:
const regex = /(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)/mg;
const str = `<TopicSet FormalName="PARENT1">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
<Topic>
<TopicType FormalName="Child1" />
</Topic>
<Topic>
<TopicType FormalName="Child2" />
</Topic>
<Topic>
<TopicType FormalName="Child3" />
</Topic>
</TopicSet>`;
const subst = `$1$2$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
如果只想提取第一个父对象,则可以添加另一个边界:
(<TopicSet FormalName="PARENT1">)([\s\S]*?)(<\/TopicSet>)