Question

=“”

正则表达式内，正则表达式用单引号替换单引号两次要将'替换为''，如果它位于<xsl:内，则'应保持原样。
代码段：

public static void main(String[] args) {
        String replaceSingleQuoteInsideXsltCondition = "(<\\s*?xsl\\s*?:.*?=.*?)(')(.*?)(')(.*?>)";
        String dummyXSLT = "<p>Thank you for sending us <xsl:for-each select=\"catalog/cd[artist='Bob Dylan']\"> " +
                "paper's to prove your <span class=\"highlight\"><xsl:if test=\"D01 ='Y'\">Income</xsl:if></span> <span class=\"highlight\"><xsl:if test=\"D02 ='Y'\">&#160;and&#160;" +
                "</xsl:if></span><span class=\"highlight\"><xsl:if test=\"D03 ='Y'\">Citizenship and/or Identity</xsl:if></span>. " +
                "We need a little more information to finish your application. Addition of few words like 7 o'clock, employees' or employ's and child's and 'xyz and 'hello'</p>" +
                "contact number for inquiry = '478965152' and email id = 'pqr@xyz'" +
                "<xsl:template match=\"num[ . = 3 or . = 5]\"/></xsl:stylesheet><xsl:if test=\"contains($search, 'Web Developer') and (contains($expSearch, 'Computer') or contains($expSearch, 'Information') or contains($expSearch, 'Web' ))\">" +
                "<xsl:if test=\"((node/ABC!='') and (normalize-space(node/DEF)='') and (normalize-space(node/GHI)=''))\"> just a dummy sample.</xsl:if>";
        System.out.println(dummyXSLT.replaceAll(replaceSingleQuoteInsideXsltCondition,  "$1''$3''$5"));
    }

以上代码的实际结果：

<p>Thank you for sending us <xsl:for-each select="catalog/cd[artist=''Bob Dylan'']"> paper's to prove your <span class="highlight"><xsl:if test="D01 =''Y''">Income</xsl:if></span> <span class="highlight"><xsl:if test="D02 =''Y''">&#160;and&#160;</xsl:if></span><span class="highlight"><xsl:if test="D03 =''Y''">Citizenship and/or Identity</xsl:if></span>. We need a little more information to finish your application. Addition of few words like 7 o'clock, employees' or employ's and child's and 'xyz and 'hello'</p>contact number for inquiry = '478965152' and email id = 'pqr@xyz'<xsl:template match="num[ . = 3 or . = 5]"/></xsl:stylesheet><xsl:if test="contains($search, ''Web Developer'') and (contains($expSearch, 'Computer') or contains($expSearch, 'Information') or contains($expSearch, 'Web' ))"><xsl:if test="((node/ABC!='''') and (normalize-space(node/DEF)='') and (normalize-space(node/GHI)=''))"> just a dummy sample.</xsl:if>

预期结果：

<p>Thank you for sending us <xsl:for-each select="catalog/cd[artist=''Bob Dylan'']"> paper's to prove your <span class="highlight"><xsl:if test="D01 =''Y''">Income</xsl:if></span> <span class="highlight"><xsl:if test="D02 =''Y''">&#160;and&#160;</xsl:if></span><span class="highlight"><xsl:if test="D03 =''Y''">Citizenship and/or Identity</xsl:if></span>. We need a little more information to finish your application. Addition of few words like 7 o'clock, employees' or employ's and child's and 'xyz and 'hello'</p>contact number for inquiry = '478965152' and email id = 'pqr@xyz'<xsl:template match="num[ . = 3 or . = 5]"/></xsl:stylesheet><xsl:if test="contains($search, ''Web Developer'') and (contains($expSearch, ''Computer'') or contains($expSearch, ''Information'') or contains($expSearch, ''Web'' ))"><xsl:if test="((node/ABC!='''') and (normalize-space(node/DEF)='''') and (normalize-space(node/GHI)=''''))"> just a dummy sample.</xsl:if>

Answer 1

我认为可以使用两个不同的正则表达式替换，一个循环（“g”修饰符没有帮助。）

以下是用例的java实现的概念：

首先将所有''替换为''''，
一次，但全球
将(<xsl([^>']|'')+)'(([^>']|[^>']+'')+)'(([^'>])+)替换为\1''\3''\5，不是全局而是循环，直到它不再替换任何东西
如果可行，则下一步是让其接受xsl和XSL，并允许所需的可选空格
(<\\s*(xsl|XSL)([^>']|'')+)'(([^>']|[^>']+'')+)'(([^'>])+)

我不是javaman（尊重双关语），所以我不能在java中提供演示器这是一个演示者（你不需要它，只是为了展示我测试的东西）它实现了上述概念，并具有给定样本输入的所需输出。

bash-3.1$ sed -En "1{s/''/''''/g;:a;s/(<xsl([^>']|'')+)'(([^>']|[^>']+'')+)'(([^'>])+)/\1''\3''\5/;ta;p};" input.txt > output.txt

主要技巧是寻找在已经成功更换的部件中不会出现的东西，然后在成功时进行更换第二个技巧是首先替换需要替换的所有内容，但已经被替换（'' - ＆gt; ''''）。

注意：
虽然java和sed具有可能不同的正则表达式风格，但在将正则表达式与我的正则表达式进行比较时，我看不到任何明显冲突的内容。我甚至不包含\s \d \w或类似的任何内容您可能必须使用$1''$3''$5代替\1''\3''\5。

Answer 2

如果允许在<xsl> </>标记内任意嵌套元素，则这是不可能的。见RegEx match open tags except XHTML self-contained tags。

您可以为此特定情况设计正则表达式，但不是针对每种可能的情况。

Answer 3

如果您只是解析 TAGS ，这是有效的如果您正在尝试解释HTML闭包，则无法使用Java实现正则表达式。

基本思想是不能只解析xsl标签。必须解析所有标签提前匹配位置并浏览可能隐藏html的标签。

因此，必须解析所有标签在下面的正则表达式中， Capture Group 2 包含您要查找的xsl标记。

所有标记将匹配。您可以忽略这些，只需查找时间捕获组2有长度。那是你想要操纵的那个。

我们所做的是使用回拨 全部替换。

回调内部：

如果捕获组2不匹配（即没有长度）
只返回捕获组0的内容（匹配）这只是取代了匹配的东西。这些是其他标签。
如果捕获组2 匹配将组2复制到字符串
并在strinG（它的内容）上运行另一个正则表达式替换这将是全球查找(?<!')'(?!')替换'' 返回该字符串作为回调中的替换。

这就是它的全部。

现在就抓住你自己这是正则表达式。

（如果你愿意，可以随意使这个案例不敏感）

扩展

 <
 (?:
      (?:
           (?:
                # Invisible content; end tag req'd
                (                             # (1 start)
                     script
                  |  style
                     #|  head
                  |  object
                  |  embed
                  |  applet
                  |  noframes
                  |  noscript
                  |  noembed 
                )                             # (1 end)
                (?:
                     \s+ 
                     (?>
                          " [\S\s]*? "
                       |  ' [\S\s]*? '
                       |  (?:
                               (?! /> )
                               [^>] 
                          )?
                     )+
                )?
                \s* >
           )

           [\S\s]*? </ \1 \s* 
           (?= > )
      )

   |  (?: /? [\w:]+ \s* /? )

   |  (                             # (2 start), The xsl: we want to find
           xsl: [\w:-]* 
           \s+ 
           (?:
                " [\S\s]*? " 
             |  ' [\S\s]*? ' 
             |  [^>]? 
           )+
           \s* /?
      )                             # (2 end)
   |  (?:
           [\w:]+ 
           \s+ 
           (?:
                " [\S\s]*? " 
             |  ' [\S\s]*? ' 
             |  [^>]? 
           )+
           \s* /?
      )
   |  \? [\S\s]*? \?
   |  (?:
           !
           (?:
                (?: DOCTYPE [\S\s]*? )
             |  (?: \[CDATA\[ [\S\s]*? \]\] )
             |  (?: -- [\S\s]*? -- )
             |  (?: ATTLIST [\S\s]*? )
             |  (?: ENTITY [\S\s]*? )
             |  (?: ELEMENT [\S\s]*? )
           )
      )
 )
 >

最后的注意事项 - 要了解这个正则表达式的有效性和快速性，获取一个大的html源代码。运行全局查找并替换为'' 您现在将看到所有内容，完全脱离了html。

正如在<xsl：或=“”<xsl：=“”

3 个答案: