在iOS应用程序中使用libxml2,在解析此HTML文件时(它是大页面的一部分) -
...
<span class="ingredient">
<span class="amount">
<span class="value">500 </span>
<span class="type">g</span>
</span>
<a href="...">bread flour</a>
or
<span class="ingredient">
<span class="amount">
<span class="value">500 </span>
<span class="type">g</span>
</span>
<span class="name">
<a href="...">all-purpose flour</a>
</span>
</span>
</span>
...
我只需提取文字:“500克面包粉或500克通用面粉”。
返回//span[@class="ingredient"]
XPath查询的解析后的NSDictionary结果 -
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = ingredient;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = amount;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = value;
}
);
nodeContent = 500;
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = type;
}
);
nodeContent = g;
nodeName = span;
}
);
nodeContent = "";
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "http://www.food.com/library/flour-64";
}
);
nodeContent = "bread flour";
nodeName = a;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = ingredient;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = amount;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = value;
}
);
nodeContent = 500;
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = type;
}
);
nodeContent = g;
nodeName = span;
}
);
nodeContent = "";
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = name;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "http://www.food.com/library/flour-64";
}
);
nodeContent = "all-purpose flour";
nodeName = a;
}
);
nodeContent = "";
nodeName = span;
}
);
nodeContent = "";
nodeName = span;
}
);
nodeContent = or;
nodeName = span;
}
问题是字典根的“nodeContent”是文本“or”,并且所有标记都作为根节点的子节点而存在,因此片段的顺序丢失了 - 我无法分辨或实际上是在所有文字的中间和连续,我得到以下字符串:“或500克面包粉500克通用面粉。”
任何人都可以找到在1个XPath查询中提取纯文本的方法,或者使用XPath引擎来读取有序的元素列表吗?
答案 0 :(得分:0)
当您需要所有文本节点时,可以使用
轻松完成//text()
将返回所有节点。您的内容中存在空白空间问题,您可以使用
省略所有仅空白节点//text()[not(matches(., '$[\s]+$', 'm'))]
之后,您仍然需要在Objective C中进行一些修剪(例如“g”),但是您应该获得包含可打印字符的所有文本节点的有序结果集。