有效地替换ANTLRInputStream(ANTLRStringStream)的文件输入中的字符串或字符

时间:2012-04-04 14:26:01

标签: java pattern-matching antlr3

正如我在Antlr greedy-option中所描述的那样,我对一种语言有一些问题,这种语言可能包含字符串文字内的字符串文字,例如:     

START: "img src="test.jpg""

先生。 Bart Kiers在我的帖子中提到,不可能创建一个可以解决我的问题的语法。因此我决定将语言改为:

START: "img src='test.jpg'"

在启动词法分析器(和解析器)之前。

文件输入可以是:

START: "aaa"aaa"
 "aaa"aaaaa"
:END_START

START: "aaa"aaa"
 "aaa"aa
 a
 aa"
:END_START

START: "aaab"bbaaaa"
:END_START

所以我有一个解决方案,但是它不正确。我有关于我的问题的两个问题(在代码下面)。我的代码是:

public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}


我的问题是:

  • 哪种模式是正确的?重要的是,字符串文字可以在多个行上定义,例如“aaaa”\ n“bbb”,但总是以“\ n:END_START”行关闭。我的愿望是以下结果:
START: "aaa'aaa'
 'aaa'aaaaa"
:END_START

START: "aaa'aaa'
 'aa'aa
 a
 aa"
:END_START

START: "aaab'bbaaaa"
:END_START

我玩模式标志Pattern.DOTALL     

Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
但这不是解决方案,因为在这种情况下它匹配所有内容......




  - 如果我使用正确的模式,还有其他有效的方法来解决它吗?



修复第一个问题
我必须使用模式标志Pattern.DOTALL:

的非贪婪方法
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

1 个答案:

答案 0 :(得分:0)

修复第一个问题
我必须使用模式标志Pattern.DOTALL:

的非贪婪方法
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

代码:

 public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
        System.out.println("++++"+matcher.group(2));
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}

那么有什么其他方法可以解决这个问题吗?