Question

<tr><th>Biography</th>
<td>   A bunch of random info here   <td>

我想在传记线和开场后获得所有内容下一行的标记。如果新行字符不在括号中：（？＆lt; = Biography \ n）。{1,50}或如果它是:(？＆lt; = Biography [\ n]）。{1,50}我不确定这将使所有字符从下一行开始。但是他们都没有回来。在HTML数据字符串中读取换行符的正确方法是什么？

Answer 1

Never parse HTML with regex !

使用适当解析器的解决方案：

$ saxon-lint --html --xpath '//*[.="Biography"]/../td/text()' file
A bunch of random info here

检查https://github.com/sputnick-dev/saxon-lint

用于解析HTML NewLine的正则表达式

1 个答案: