Question

我正在尝试从下面的html片段中提取文本。需要有关regex模式的帮助，它将替换所有的html标记，并且只会遗漏内容。

我尝试使用下面的表达式移除<span*>但是没有做到这一点。

 String x = '<span style="font-size:11pt;"><span style="line-height:107%;"><span style="font-family:Calibri, sans-serif;"><strong><font color="#000000">Some normal text here...</font></strong></span></span></span>';
 String y = x.replaceAll('[<span*\b>]','');
 system.debug(y);

打印出来：

  tyle="fot-ize:11t;" tyle="lie-height:107%;" tyle="fot-fmily:Clibri, -erif;"trogfot color="#000000"Some normal text here.../fot/trog///

所以它基本上单独替换了每个字符，而不是<span ... >

之间的内容

需要帮助

Answer 1

第二行代码应为：

String y = x.replaceAll('<span[^>]*>','');

此陈述的含义是：对于'<span'的所有出现，后跟除*（'>'）之后的任何事件（[^>]）之后的多次出现（'>'） </span>，替换为空。

顺便说一下，您将错过结束标签Traceback (most recent call last): File "trab22.py", line 60, in <module> main () File "trab22.py", line 55, in main fun_listas1 = inverte_lista (puzzle) File "trab22.py", line 12, in inverte_list for i in range (len (cop_puzzle) / 2-1): TypeError: 'float' object can not be interpreted as an integer。我告诉你这只是为了你的信息，因为你没有要求这个。

用于替换给定文本字符串中的html的正则表达式模式

1 个答案: