模式匹配解析器

时间:2013-02-18 18:15:08

标签: java regex parsing

实际上,我构建了一个Java代码来解析以下文本文件:

     (FAMIX.Attribute (id: 22)
(name 'obj_I')
(parentType (ref: 11))
(declaredType (ref: 27))
(isPrivate true)
   )

   (FAMIX.Attribute (id: 38)
(name 'obj_k')
(parentType (ref: 34))
(declaredType (ref: 43))
(isPrivate true)
   )

  (FAMIX.Attribute (id: 56)
(name 'obj_K')
(parentType (ref: 46))
(declaredType (ref: 43))
(isPrivate true)
    )

  (FAMIX.Attribute (id: 73)
(name 'obj_L')
(parentType (ref: 64))
(declaredType (ref: 45))
(isPrivate true)
    )

 (FAMIX.Attribute (id: 67)
(name 'obj_G')
(parentType (ref: 64))
(declaredType (ref: 46))
(isPrivate true)
    )

 (FAMIX.Attribute (id: 93)
(name 'classD')
(parentType (ref: 85))
(declaredType (ref: 94))
(isPrivate true)
   )

  (FAMIX.Attribute (id: 99)
(name 'classC')
(parentType (ref: 86))
(declaredType(ref: 86))
(isPackage true)
    )

 (FAMIX.Attribute (id: 114)
(name 'classB')
(parentType (ref: 94))
(declaredType (ref: 11))
(isPrivate true)
    )

  (FAMIX.Attribute (id: 107)
(name 'obj_c')
(parentType (ref: 94))
(declaredType (ref: 86))
(isPrivate true)
     )

Java代码:

// Find Attributes

Pattern p111 = Pattern.compile("FAMIX.Attribute");

Matcher m111 = p111.matcher(line);
while (m111.find()) {

    FAMIXAttribute obj = new FAMIXAttribute();              
    Pattern p222 = Pattern.compile("id:\\s*([0-9]+)");
    Matcher m222 = p222.matcher(line);

    while (m222.find()) {
        System.out.print(m222.group(1));
    }

    while ((line = br.readLine()) != null && !(line.contains("FAMIX"))) {

        Pattern p333 = Pattern.compile("name\\s*'([\\w]+)\\s*'");
        Matcher m333 = p333.matcher(line);

        while (m333.find()) {       

            System.out.print(m333.group(1));
        }

        Pattern p555 = Pattern.compile("parentType\\s*\\(ref:\\s*([0-9]+)\\)");
        Matcher m555 = p555.matcher(line);
        while (m555.find()) {
           System.out.print(m555.group(1));
        }

        Pattern p666 =   Pattern.compile("declaredType\\s*\\(ref:\\s*([0-9]+)\\)");
        Matcher m666 = p666.matcher(line);
        while (m666.find()) {
           System.out.print(m666.group(1));
        } 

    }

} // exit from finding Attribute

输出:

     ***************** Attributes *****************
       obj_k    38   34   43
       obj_L    73   64   45
       classD   93   85   94
       classB   114  94   11   

根据输出,问题是解析器跳过一些输出(跳转)

如果问题不清楚,请告诉我,我会尝试进一步澄清。

2 个答案:

答案 0 :(得分:0)

您忘记了正则表达式来检查IsPrivateIsPackage部分

编辑: 几个步骤会告诉你出了什么问题 添加该行的打印输出以准确查看哪些行失败以及模式如何看待它们

     // Find Attributes
                System.out.print("***"+line+"***"); 
                Pattern p111 = Pattern.compile("FAMIX.Attribute");
                Matcher m111 = p111.matcher(line);
                while (m111.find()) {

"***"将让您了解关于java的行的确切开头和结尾。 有时看起来与眼睛相同的角色对于匹配者来说是不同的。

编辑2: 你的代码缺少外部循环,其中第一行读取了它。 你意识到代码:

                  while ((line = br.readLine()) != null && !(line.contains("FAMIX"))) {

消耗“FAMIX.Attribute”出现的下一行?如果你在(缺少的)外循环中再读一次,那么你将丢失所有其他记录。

答案 1 :(得分:0)

如果您确定该文件包含指定格式的行:

  • idnameparentTypedeclaredType必须在一行中完全声明。即你没有这样的意见:

    (FAMIX.Attribute (id:
    38)
    (name 
    'obj_k')
    (parentType 
      (ref: 34))
    (declaredType (ref: 43))
    (isPrivate true)
    )
    

    但这是允许的:

    (FAMIX.Attribute (id: 38)
    (name 'obj_k') (parentType (ref: 34)) (declaredType (ref: 43)) (isPrivate true))
    

这是 下面的修改工作的前提条件。 此假设源自您当前的代码。

String line;

FAMIXAttribute obj = new FAMIXAttribute();
boolean isModified = false;

while ((line = br.readLine()) != null) {
    if (line.contains("FAMIX.Attribute")) {
        if (isModified) {
            // TODO: Save the previous obj

            obj = new FAMIXAttribute();
            isModified = false;
        } 
    }

    // TODO: Add the block of code to parse id here
    // TODO: Add id attribute to obj, set isModified to true

    // TODO: Add the block of code to parse other stuffs here
    // TODO: Add those attributes to obj, set isModified to true
}

if (isModified) {
    // TODO: Save the last obj
}