使用Java删除另一个双引号内的双引号

时间:2019-09-18 15:39:08

标签: java regex string replace double-quotes

我有一个字符串,其中的另一个双引号中包含双引号。

例如:

输入1:

<span style="font-family: pp-sans-big-light, "Noto Sans", Calibri, Trebuchet, Arial, "sans serif"; font-size: 17px; text-align: start; background-color: rgb(255, 255, 255);" class="transaction" name="details"> How are you</span>

预期的输出1:

<span style="font-family: pp-sans-big-light, Noto Sans, Calibri, Trebuchet, Arial, sans serif; font-size: 17px; text-align: start; background-color: rgb(255, 255, 255);" class="transaction" name="details"> How are you</span>

输入2:

<span title="Conditional (A/B) Content on "Transactions.Recipient Name"" class="transaction" name="details"> Transaction Recipient</span>

预期的输出2:

<span title="Conditional (A/B) Content on Transactions.Recipient Name" class="transaction" name="details"> Transaction Recipient</span>

我尝试了以下选项,

选项1:

public static void main(String[] args) throws Exception{
        int i;
        String title = null, style = null, temp = null;
        String tempNodeValue = "<?xml version=\"1.0\"?><dummyroot>+/**INPUT_HERE**/+</dummyroot>";
//        tempNodeValue = tempNodeValue.replace("\"","&quot;");
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document document = db.parse(new InputSource(new StringReader(tempNodeValue)));
        NodeList nodeList = document.getElementsByTagName("span");
        for(i=0;i<nodeList.getLength(); i++){
            Node node =nodeList.item(i);
            if(node.getAttributes().getNamedItem("title") != null){
                title = node.getAttributes().getNamedItem("title").getNodeValue();
                temp = title.replace("\"","'");
                tempNodeValue = tempNodeValue.replace("&quot;","\"");
                tempNodeValue = tempNodeValue.replace(title,temp);

            }
            if(node.getAttributes().getNamedItem("style") != null){
                style = node.getAttributes().getNamedItem("style").getNodeValue();
                temp = style.replace("\"","'");
                tempNodeValue = tempNodeValue.replace("&quot;","\"");
                tempNodeValue = tempNodeValue.replace(style,temp);
            }
        }
        System.out.println(tempNodeValue);

    }

选项2:

public static void main(String[] args) throws Exception{
        String tempNodeValue = /**INPUT_HERE**/;
        tempNodeValue = tempNodeValue.replaceAll("\"(\\b[^\"]+|\\s+)?\"(\\b[^\"]+\\b)?\"([^\"]+\\b|\\s+)?\"","\"$1$2$3\"");
        System.out.println(tempNodeValue);
    }

我也尝试了jsoup。但是他们都不起作用。选项2适用于输入2,但不适用于输入1。选项1也无效。有人可以帮我吗?我遍历了stackoverflow中所有现有的答案,没有一个有帮助。

1 个答案:

答案 0 :(得分:0)

**已更新

我以前的答案没有用,但这是一个有趣的问题,我想我已经找到了解决方案。

因此,首先确定所需的引号的开头和结尾。此正则表达式可以做到这一点:

 ">|\"? [a-z]+="

如果在此正则表达式上分割,则不需要在结果字符串中使用任何引号。

 let originalString = "<span title="Conditional (A/B) Content on "Transactions.Recipient Name"" class="transaction" name="details"> Transaction Recipient</span>";
 originalString.split(/">|\"? [a-z]+="/)

收益

 let attributeContents = [
      "<span",
      "Conditional (A/B) Content on \"Transactions.Recipient Name\"",
      "transaction",
      "details",
      " Transaction Recipient</span>"
 ];

现在,您需要做的就是遍历这些子字符串,如果它们带有引号,则用原引号中没有引号的字符串替换引号。

 for(let index in attributeContents) {
      let attributeValue = attributeContents[index];
      originalString = originalString.replace(attributeValue, attributeValue.replace(/"/g, "");
 }
 // double comments have now been removed from the original string.