描述

Question

我有一个要求。我有一个字符串，其值例如：

<p>We are pleased <a href="http://www.anc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html">to present the new product type</a>. This new product type is the best thing since sliced bread. We are pleased to present the new product type. This new product <a href="mailto:abc@gmail.com">type is the best</a> thing since sliced bread.</p>

以上文本将存储为单个字符串值。我需要在检查标准后将某些参数附加到hrefs。让我知道如何仅提取href并附加参数并显示字符串而不会造成损坏（仅供参考：字符串是通过RTE输入的值 - 富文本编辑器）

尝试了这种方法但没有成功。

String tmpStr = "href=\"http://www.abc.com\">design";

StringBuffer tmpStrBuff = new StringBuffer();
String[] tmpStrSpt = tmpStr.split(">");
if (tmpStrSpt[0].contains("abc.com")) {
    String[] tmpStrSpt1 = tmpStrSpt[0].split("\"");
    tmpStrBuff.append(tmpStrSpt1[0]);
    if (tmpStrSpt1[1].contains("?")) {
        tmpStrBuff.append("\"" + tmpStrSpt1[1] + "&s_cid=abcd_xyz\">");
    } else {
        tmpStrBuff.append("\"" + tmpStrSpt1[1] + "?s_cid=abcd_xyz\">");
    }
    tmpStrBuff.append(tmpStrSpt[1]);
    tmpStrBuff.append("</a>");
    System.out.println(" <p>tmpStr1:::: " + tmpStrBuff.toString() + "</p>");
}

使用的另一种方法是：

String[] tmpTxtArr = text.split("\\s+");
StringBuffer tmpStrBuff = new StringBuffer();
for (String tmpTxt : tmpTxtArr) {
    descTxt += (tmpTxt.contains("abc.com") && !tmpTxt.contains("?")) ? tmpTxt
            .replace("\">", "?s_cid=" + trackingCode + "\">" + " ")
            : tmpTxt + " ";
}

Answer 1

描述

这个正则表达式将：

在锚标记中找到href属性
要求href具有http://abc.com。它还允许https和www.abc.com在各自的位置。
如果字符串包含?，那么它也将被捕获并放入群组捕获3

<a\b[^<]*\bhref=(['"])(https?:\/\/(?:www[.])?abc[.]com[^"'?]*?([?]?)[^"'?]*?)\1[^<]*<\/a>

enter image description here

组

组0将拥有从开放<a到关闭</a>的整个锚点。如果您发现它过多或者与嵌套的锚标记发生碰撞，那么只需从该表达式的末尾删除[^<]*<\/a>即可。

获取公开引用，稍后在\1引用，以确保我们有相同的近距离报价
获取href值
如果有问号，则会在此处捕获

Java代码示例：

给出示例文本：

<p>Some <a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html">text</a>. I like kittens <a href="mailto:abc@gmail.com">email us</a>Dogs are nice.</p><a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html?attribute=value">remember to vote</a>

此代码

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("<a\\b[^<]*\\bhref=(['\"])(https?:\\/\\/(?:www[.])?abc[.]com[^\"'?]*?([?]?)[^\"'?]*?)\\1[^<]*<\\/a>",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

产量

$matches Array:
(
    [0] => Array
        (
            [0] => <a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html">text</a>
            [1] => <a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html?attribute=value">remember to vote</a>
        )

    [1] => Array
        (
            [0] => "
            [1] => "
        )

    [2] => Array
        (
            [0] => http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html
            [1] => http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html?attribute=value
        )

    [3] => Array
        (
            [0] => 
            [1] => ?
        )

)

从这里开始，只需简单地检查所有匹配项，如果第3组有值，则插入&如果没有，则在新文本和组中的href值之间插入? 2。

声明

从长远来看，使用正则表达式解析HTML可能不是最简单的维护。但是，如果您可以控制输入文本，那么文本仍然非常简单，并且您愿意拥有定期表达可能失败的周期性边缘情况，那么正则表达式将适合您。

有些仇敌会指出以下字符串不能正确匹配。尽管如此，在HTML中这些可能性是非法的或不切实际的，因此不太可能遇到。

<a href="http://abc.com?attrib=</a>">link</a>额外的特殊符号< /和>以HTML格式工作，需要对其进行转义。如此处所示，这将违反HTML标准。
<a href="http://abc.com?attrib=value">outside<a href="http://abc.com?attrib=value2">inside</a></a>嵌套链接可能是合法的，但它强制浏览器选择遵循哪个锚标记，我从未见过这种格式。

字符串操作 - 富文本编辑器

1 个答案:

描述

组

Java代码示例：

声明