将字符串标记或拆分为文本& Html标签项

时间:2017-02-27 18:14:33

标签: c# .net regex

我正在寻找接受字符串和令牌的最有效方法,将其分成一个分隔出任何HTML标记组的数组。

Example Input (String): 
    "I can format my text so that <strong>This is bold</strong> and this is not."

Desired Output (String[] array): 
    "I can format my text so that",
    "<strong>",
    "This is bold",
    "</strong>",
    "and this is not."

Alternate Output Just As Good(String[] array): 
    "I",
    "can",
    "format",
    "my",
    "text",
    "so",
    "that",
    "<strong>",
    "This",
    "is",
    "bold",
    "</strong>",
    "and",
    "this",
    "is",
    "not."

我不确定解决此问题的最佳方法。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

您可以将Regex.Split()与一组零长度断言一起分割,然后按<>开头:

string input = "I can format my text so that <strong>This is bold</strong> and this is not.";
string[] output = Regex.Split(input, "(?=<)|(?<=>)");

(?=pattern)被称为前瞻性断言,确保跟随pattern (?<=pattern)是一个后视断言,同样的概念,但在位置之前查看字符