替换某个结构的开括号和右括号?

时间:2017-12-07 16:02:04

标签: c# regex

我正在尝试将某个标签内的括号替换为标签外部,即如果标签后面有一个左括号或紧接在结束标签之前有一个右括号。例如:

<italic>(When a parenthetical sentence stands on its own)</italic>
<italic>(When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic>

这些行应该在替换之后:

(<italic>When a parenthetical sentence stands on its own</italic>)
(<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own</italic>)

但是,下面三个字符串应保持不变。

<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic>

但是以下字符串:

<italic>((When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic>
<italic>(When) a parenthetical sentence stands on its own)</italic>
<italic>When a parenthetical sentence stands on its (own))</italic>
<italic>(When a parenthetical sentence stands on its (own)</italic>

应该在替换之后:

(<italic>(When) a parenthetical sentence stands on its own</italic>
(<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own)</italic>)
(<italic>When a parenthetical sentence stands on its (own)</italic>

<italic>...</italic>标记内可能有嵌套标记,一行可以包含多个<italic>...</italic>字符串。 此外,如果<inline-formula>...</inline-formula>中有嵌套标记<italic>...</italic>,则应忽略这些标记。

我可以使用正则表达式吗?如果没有其他方式可以做到这一点?

我的方法就是这个(我仍然不确定它是否涵盖了所有可能的情况):

第一步:<italic>( ---> (<italic> 找到<italic>(如果标签后面没有一对匹配的括号,后面跟着一个结束标签 匹配仅允许在一行内。

查找内容:(<(italic)>)(?!(\((?>(?:(?![()\r\n]).)++|(?3))*+\))(?!</$2\b))(\() 替换为:$4$1

第二步:)</italic> ---> </italic>) 如果标签前面没有一对匹配的括号,前面没有开头标记,请找)</italic> 匹配仅允许在一行内。

(\))(?<!(?<!<(italic)>)(\((?>(?:(?![()\r\n]).)++|(?3))*+\)))(</2\b>)

1 个答案:

答案 0 :(得分:1)

你可以通过几种不同的方式做到这一点,我首先要确定标签何时可以替换。

  1. 如果标签中的文字以(并且在结束标签之前关闭,或者未公开)
  2. ,我们可以替换开始标签
  3. 如果标签中的文字结尾,我们可以替换结束标记,并且它在开始标记之后立即打开,或者未打开
  4. 这个问题似乎适用于解析器方法并跟踪括号状态(标记文本开头是否有括号,以及嵌套是当前点的括号)。编写解析器可以让我们以建设性的方式进行替换,而不是使用正则表达式进行搜索,并且替换子字符串并且自然会递归处理嵌套。使用正则表达式执行此操作似乎有点复杂。这就是我想出来的。

    using System;
    using System.IO;
    using System.Text;
    
    namespace ParenParser {
        public class Program
        {
            public static Stream GenerateStreamFromString(string s)
            {
                MemoryStream stream = new MemoryStream();
                StreamWriter writer = new StreamWriter(stream);
                writer.Write(s);
                writer.Flush();
                stream.Position = 0;
                return stream;
            }
    
            public static String Process(StreamReader s) { // root
                StringBuilder output = new StringBuilder();
                while (!s.EndOfStream) {
                    var ch = Convert.ToChar(s.Read());
                    if (ch == '<') {
                        output.Append(ProcessTag(s, true));
                    } else {
                        output.Append(ch);
                    }
                }
    
                return output.ToString();
            }
    
            public static String ProcessTag(StreamReader s, bool skipOpeningBracket = true) {
                int currentParenDepth = 0;
                StringBuilder openingTag = new StringBuilder(), allTagText = new StringBuilder(), closingTag = new StringBuilder();
                bool inOpeningTag = false, inClosingTag = false;
                if (skipOpeningBracket) {
                    inOpeningTag = true;
                    openingTag.Append('<');
                    skipOpeningBracket = false;
                }
    
                while (!s.EndOfStream) {
                    var ch = Convert.ToChar(s.Read());
                    if (ch == '<') { // start of a tag
                        var nextCh = Convert.ToChar(s.Peek());
                        if (nextCh == '/') { // closing tag!
                            closingTag.Append(ch);
                            inClosingTag = true;
                        } else if (openingTag.ToString().Length != 0) { // already seen a tag, recurse
                            allTagText.Append(ProcessTag(s, true));
                            continue;
                        } else {
                            openingTag.Append(ch);
                            inOpeningTag = true;
                        }
                    }
                    else if (inOpeningTag) {
                        openingTag.Append(ch);
                        if (ch == '>') {
                            inOpeningTag = false;
                        }
                    }
                    else if (inClosingTag) {
                        closingTag.Append(ch);
                        if (ch == '>') {
                            // Done!
                            var allTagTextString = allTagText.ToString();
                            if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth == 0) {
                                return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 2) + closingTag.ToString() + ")";
                            } else if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && currentParenDepth > 0) { // unclosed
                                return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 1) + closingTag.ToString();
                            } else if (allTagTextString.Length > 0 && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth < 0) { // unopened
                                return openingTag.ToString() + allTagTextString.Substring(0, allTagTextString.Length - 1) + closingTag.ToString() + ")";
                            } else {
                                return openingTag.ToString() + allTagTextString + closingTag.ToString();
                            }
                        }
                    }
                    else
                    {
                        allTagText.Append(ch);
                        if (ch == '(') {
                            currentParenDepth++;
                        }
                        else if (ch == ')') {
                            currentParenDepth--;
                        }
                    }
                }
    
                return openingTag.ToString() + allTagText.ToString() + closingTag.ToString();
            }
    
            public static void Main()
            {
                var testCases = new String[] {
                    // Should change
                    "<italic>(When a parenthetical sentence stands on its own)</italic>",
                    "<italic>(When a parenthetical sentence stands on its own</italic>",
                    "<italic>When a parenthetical sentence stands on its own)</italic>",
    
                    // Should remain unchanged
                    "<italic>(When) a parenthetical sentence stands on its own</italic>",
                    "<italic>When a parenthetical sentence stands on its (own)</italic>",
                    "<italic>When a parenthetical sentence stands (on) its own</italic>",
    
                    // Should be changed
                    "<italic>((When) a parenthetical sentence stands on its own</italic>",
                    "<italic>((When) a parenthetical sentence stands on its own)</italic>",
                    "<italic>(When) a parenthetical sentence stands on its own)</italic>",
                    "<italic>When a parenthetical sentence stands on its (own))</italic>",
                    "<italic>(When a parenthetical sentence stands on its (own)</italic>",
    
                    // Other cases
                    "<italic>(Try This on!)</italic>",
                    "<italic><italic>(Try This on!)</italic></italic>",
                    "<italic></italic>",
                    "",
                    "()",
                    "<italic>()</italic>",
                    "<italic>"
                };
    
                foreach(var testCase in testCases) {
                    using(var testCaseStreamReader = new StreamReader(GenerateStreamFromString(testCase))) {
                        Console.WriteLine(testCase + " --> " + Process(testCaseStreamReader));
                    }
                }
            }
        }
    }
    

    测试用例结果类似于

    <italic>(When a parenthetical sentence stands on its own</italic> --> (<italic>When a parenthetical sentence stands on its own</italic>
    <italic>When a parenthetical sentence stands on its own)</italic> --> <italic>When a parenthetical sentence stands on its own</italic>)
    <italic>(When) a parenthetical sentence stands on its own</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>
    <italic>When a parenthetical sentence stands on its (own)</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>
    <italic>When a parenthetical sentence stands (on) its own</italic> --> <italic>When a parenthetical sentence stands (on) its own</italic>
    <italic>((When) a parenthetical sentence stands on its own</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>
    <italic>((When) a parenthetical sentence stands on its own)</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>)
    <italic>(When) a parenthetical sentence stands on its own)</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>)
    <italic>When a parenthetical sentence stands on its (own))</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>)
    <italic>(When a parenthetical sentence stands on its (own)</italic> --> (<italic>When a parenthetical sentence stands on its (own)</italic>
    <italic>(Try This on!)</italic> --> (<italic>Try This on!</italic>)
    <italic><italic>(Try This on!)</italic></italic> --> (<italic><italic>Try This on!</italic></italic>)
    <italic></italic> --> <italic></italic>
     --> 
    () --> ()
    <italic>()</italic> --> (<italic></italic>)
    <italic> --> <italic>