Question

我是webharvest的新手，正在使用它从网站上获取文章数据，使用以下声明：

let $text := data($doc//div[@id="articleBody"])

这是我从上述声明中得到的数据：

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

Notable current and former residents of Pittstown include:

我的问题是，是否可以使用配置删除“名人”之后的整个内容。有可能这样做吗？如果有可能请告诉我如何。感谢。

修改所需的输出：

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

Answer 1

你只需要改变你的let语句，如：

让$ text：= substring-before（data（$ doc // div [@ id =“articleBody”] / text（）），'知名人士'）

获得所需的输出

如何剥离从网络收获中获得的部分文本

1 个答案: