如何使用嵌套组在java中用正则表达式替换字符串

时间:2016-06-23 14:15:06

标签: java regex string

我有格式行

"123","45","{"VFO":[B501], "AGN":[605,B501], "AXP":[665], "QAV":[720,223R,251Q,496M,548A,799M]}","4"

它可以更长但总是包含

"number","number","someValues","digit"

我需要用引号

将值包装在someValues中

对于测试字符串预期结果应为。

"123","45","{"VFO":["B501"], "AGN":["605","B501"], "AXP":["665"], "QAV":["720","223R","251Q","496M","548A","799M"]}","4"

请在java中建议最简单的解决方案。

P.S。

我的变体:

                        String valuePattern = "\\[(.*?)\\]";
                        Pattern valueR = Pattern.compile(valuePattern);
                        Matcher valueM = valueR.matcher(line);
                        List<String> list = new ArrayList<String>();
                        while (valueM.find()) {
                            list.add(valueM.group(0));
                        }
                        String value = "";
                        for (String element : list) {
                            element = element.substring(1, element.length() - 1);
                            String[] strings = element.split(",");
                            String singleGroup = "[";
                            for (String el : strings) {
                                singleGroup += "\"" + el + "\",";
                            }
                            singleGroup = singleGroup.substring(0, singleGroup.length() - 1);
                            singleGroup = singleGroup + "]";
                            value += singleGroup;
                        }
                        System.out.println(value);

2 个答案:

答案 0 :(得分:1)

EDITED

好的,这是我找到的最短的方式,它在我看来非常好用,除了我必须手动添加的逗号和括号...有人可能能够立即做到但我发现它处理嵌套组的替换很棘手。

import java.util.*;
import java.lang.*;
import java.io.*;

Pattern p = Pattern.compile("(\\[(\\w+))|(,(\\w+))");
Matcher m = p.matcher("\"123\",\"45\",\"{\"VFO\":[B501], \"AGN\":[605,B501], \"AXP\":[665], \"QAV\":[720,223R,251Q,496M,548A,799M]}\",\"4\"");
StringBuffer s = new StringBuffer();
while (m.find()){
  if(m.group(2)!=null){
    m.appendReplacement(s, "[\""+m.group(2)+"\"");
  }else if(m.group(4)!=null){
    m.appendReplacement(s, ",\""+m.group(4)+"\"");
  }
}
m.appendTail(s);
print(s);

答案 1 :(得分:0)

正如我上面评论的那样,我认为这里的真正解决方案是修复产生这种错误输出的事情。在一般情况下,我不相信它可以正确解析:如果字符串包含嵌入的括号或逗号字符,那么就无法确定哪些部分是哪个。

但是,你可以通过简单地忽略所有引用字符并对其余部分进行标记来获得非常接近的结果:

public final class AlmostJsonSanitizer {
  enum TokenType {
    COMMA(','),
    COLON(':'),
    LEFT_SQUARE_BRACKET('['),
    RIGHT_SQUARE_BRACKET(']'),
    LEFT_CURLY_BRACKET('{'),
    RIGHT_CURLY_BRACKET('}'),
    LITERAL(null);

    static Map<Character, TokenType> LOOKUP;
    static {
      Map<Character, TokenType> lookup = new HashMap<Character, TokenType>();
      for (TokenType tokenType : values()) {
        lookup.put(tokenType.ch, tokenType);
      }
      LOOKUP = Collections.unmodifiableMap(lookup);
    }

    private final Character ch;

    private TokenType(Character ch) {
      this.ch = ch;
    }
  }

  static class Token {
    final TokenType type;
    final String string;

    Token(TokenType type, String string) {
      this.type = type;
      this.string = string;
    }
  }

  private static class Tokenizer implements Iterator<Token> {
    private final String buffer;
    private int pos;

    Tokenizer(String buffer) {
      this.buffer = buffer;
      this.pos = 0;
    }

    @Override
    public boolean hasNext() {
      return pos < buffer.length;
    }

    @Override
    public Token next() {
      char ch = buffer.charAt(pos);
      TokenType type = TokenType.LOOKUP.get(ch);
      // If it's in the lookup table, return a token of that type
      if (type != null) {
        pos++;
        return new Token(type, null);
      }
      // Otherwise it's a literal
      StringBuilder sb = new StringBuilder();
      while (pos < buffer.length) {
        ch = buffer.charAt(pos++);
        // Skip all quote characters
        if (ch == '"') {
          continue;
        }
        // If we've found a different type of token then stop
        if (TokenType.LOOKUP.get(ch) != null) {
          break;
        }
        sb.append(ch);
      }
      return new Token(TokenType.LITERAL, sb.toString());
    }

    @Override
    public boolean remove() {
      throw new UnsupportedOperationException();
    }
  }

  /** Convenience method to allow using a foreach loop below. */
  static Iterable<Token> tokenize(final String input) {
    return new Iterable<Token>() {
      @Override
      public Iterator<Token> iterate() {
        return new Tokenizer(input);
      }
    };
  }

  public static String sanitize(String input) {
    StringBuilder result = new StringBuilder();
    for (Token token : tokenize(input)) {
      switch (token.type) {
        case COMMA:
          result.append(", ");
          break;

        case COLON:
          result.append(": ");
          break;

        case LEFT_SQUARE_BRACKET:
        case RIGHT_SQUARE_BRACKET:
        case LEFT_CURLY_BRACKET:
        case RIGHT_CURLY_BRACKET:
          result.append(token.type.ch);
          break;

        case LITERAL:
          result.append('"').append(token.string).append('"');
          break;
      }
    }
    return result.toString();
  }
}

如果您愿意,您也可以进行一些健全性检查,例如确保括号平衡。由你决定,这只是一个例子。