Question

我有一个CSV，我正在尝试重新格式化，其中包含一些HTML，但HTML中有逗号，这使得生活变得困难。

如何使用正则表达式将HTML标记中的逗号替换为HTML实体。

到目前为止，我已尝试像>(.+?),(.+?)<这样的事情无济于事。

我可能会使用文本编辑器来进行实际替换，很可能是Atom。

编辑：这是一个示例：

U,4,EXAMPLESKU,<font face="Times New Roman" size="3">  <p align="center"><font face="Times New Roman" size="3"><strong><span style="font-size: medium;">&nbsp;<span style="font-size: medium;">Example</span></span></strong></font></p>  <p align="center"><font face="Times New Roman" size="3">Content goes in here, including commas, sometimes multiple.</font><a href="mailto:email@example.com"><font face="Times New Roman" size="3">email@example.com</font></a><font face="Times New Roman" size="3">. <br/>  Some more content here, including commas, sometimes multiple.</font>&nbsp;&nbsp; </p>  </font>,image.jpg,9.99,Example,3~53,0.00,0,0,0,0.500,2,1

Answer 1

有关详细信息，请参阅我的帖子解决此问题。

^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$将匹配整行，然后您可以使用match.Groups [1] .Captures来获取数据（不带引号）。另外，我让“我的名字是”“在引号”“”是一个有效的字符串。

在CSV中匹配HTML标记之间的逗号

1 个答案: