如何在源代码中提取星号缩进的JavaDoc块体?

时间:2014-03-07 16:16:44

标签: java regex

我有一个看起来像这样的字符串块。

/** 
* Comment section for Asset Record config OnStatusChange
*
* Updated for HT342408  Set Assetmeters to inactive when the asset they are
* associated with is retired. This will also cause the condition monitoring
* point associated with the meter to be displayed as inactive.
*/


if (ASSET.retired_date.isnull){
  ASSET.retired_date = new Date();
}

var meterset = ASSET.ASSETMETER;
for (var x = 0; x < meterset.length; x++){
//println('*********meterset['+ x + '].assetmeterid' + meterset[x].assetmeterid);
meterset[x].active = false; 
//println('*********meterset['+ x + '].active' + meterset[x].active);
}

我想要做的是在顶部提取javadoc样式文本注释,/ **和* /之间的所有内容到目前为止我一直无法弄清楚如何在java中解决这个问题。我最近的尝试是使用模式,但它似乎不匹配。任何人都可以提供一些如何轻松解决这个问题的帮助吗?

Pattern p = Pattern.compile(".*/\\*\\*.*\\*/");
System.out.println("checking text for pattern: " + p);
Matcher m = p.matcher(scriptContents);
if (m.find()) {
    System.out.println("Found Match");
    System.out.println(m.group(1));
}

6 个答案:

答案 0 :(得分:1)

String.split("/*")会将字符串拆分为2个字符串,所以抓住第二个字符串,然后String.split("*/")并抓取第一个字符串,它应该是你的文本。

编辑 - hmm星号不会出现在我的评论中

答案 1 :(得分:1)

您可以尝试类似

的内容
Pattern p = Pattern.compile("/\\*\\*.*?\\*/",Pattern.DOTALL);

此正则表达式使用DOTALL标记让.也匹配行分隔符,因此现在.*也可以匹配多行子字符串。
此外,您需要查找/***/之间可能的最小匹配,因此您需要使用.*而不是.*?(称为reluctant quantifier)。通过这种方式,您将能够找到最小匹配

text /** fist doc */ whatever /** fist doc */ text
     ^^^^^^^^^^^^^^^          ^^^^^^^^^^^^^^^

而不是最大的(这是正则表达式量化的默认方式)

text /** fist doc */ whatever /** fist doc */ text
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

请注意,因为正则表达式无法识别匹配的文本是来自真实文档,还是来自某些

  • string literal "text /** whatever */ another test"
  • 或评论// text /** whatever */ another test

DEMO

答案 2 :(得分:0)

这将逐行搜索输入字符串,从每个JavaDoc块中提取正文(此示例代码中有两个),并使用正则表达式消除每行开头的可选星号。请注意,这假设正文文本与开始和结束标记不在同一行 - 也就是说,/**/本身就是在线上。

班级顶部:

import  java.util.regex.Matcher;
import  java.util.regex.Pattern;
/**
   <P>{@code java ExtractJavaDocBody}</P>
 **/
public class ExtractJavaDocBody  {
   public static final void main(String[] ignored)  {

建立演示输入:

      String sLS = System.getProperty("line.separator", "\r\n");
      StringBuilder input = new StringBuilder().
         append("...                     ").append(sLS).
         append("/").append("**          ").append(sLS).
         append("* Comment section for Asset Record config OnStatusChange").append(sLS).
         append("*                       ").append(sLS).
         append("* Updated for HT342408  Set Assetmeters to inactive when the asset they are").append(sLS).
         append("* associated with is retired. This will also cause the condition monitoring").append(sLS).
         append("* point associated with the meter to be displayed as inactive.").append(sLS).
         append("*").append("/            ").append(sLS).
         append("...                      ").append(sLS).
         append("/").append("**           ").append(sLS).
         append("* Another block line 1   ").append(sLS).
         append("*                        ").append(sLS).
         append("* Another block line 2   ").append(sLS).
         append("* Another block line 3   ").append(sLS).
         append("* Another block line 4   ").append(sLS).
         append("*").append("/            ").append(sLS).
         append("...                      ").append(sLS);
      String[] lines = input.toString().split(sLS);

主要逻辑:

      //"": To reuse matcher
      Matcher mtchrPostAstrsk = Pattern.compile("^\\*?[ \t]*(.*)$").matcher("");

      boolean isBlockStarted = false;
      for(String line : lines)  {
         line = line.trim();
         if(!isBlockStarted)  {
            if(line.startsWith("/" + "*"))  {
               //Assumes body starts on next line
               isBlockStarted = true;
            }
            continue;
         }  else if(line.endsWith("*" + "/"))  {
            isBlockStarted = false;
         }  else  {
            //Block is started
            mtchrPostAstrsk.reset(line).matches(); //Actually does the match

            //Trim to eliminate spaces between asterisk and text
            System.out.println(mtchrPostAstrsk.group(1).trim());
         }

      }

   }
}

输出:

[C:\java_code\]java ExtractJavaDocBody
Comment section for Asset Record config OnStatusChange

Updated for HT342408  Set Assetmeters to inactive when the asset they are
associated with is retired. This will also cause the condition monitoring
point associated with the meter to be displayed as inactive.
Another block line 1

Another block line 2
Another block line 3
Another block line 4.

答案 3 :(得分:0)

如果您需要文本块的子字符串,请使用substring

    String commentText = "/** * Comment section for ... as inactive. */";
    int startIndex = commentText.indexOf("/**") + "/**".length();
    int endIndex = commentText.lastIndexOf("*/");
    String commentSubstring = commentText.substring(startIndex, endIndex);
    System.out.println(commentSubstring); //  * Comment section for ... as inactive.

答案 4 :(得分:0)

这是来自C / C ++风格的评论 全球搜索,将返回捕获组1 因此,如果长度捕获组1> 0,找到/** comments **/样式。

    # (?:(/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)|//(?:[^\\]|\\\n?)*?\n)|(?:"(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)
    # "(?:(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"(?:\\\\[\\S\\s]|[^\"\\\\])*\"|'(?:\\\\[\\S\\s]|[^'\\\\])*'|[\\S\\s][^/\"'\\\\]*)"     

    (?:                           # Comments 

         (                             # (1 start)
              /\*                           # Start /* .. */ comment
              [^*]* \*+
              (?: [^/*] [^*]* \*+ )*
              /                             # End /* .. */ comment
         )                             # (1 end)
      |  
         //                            # Start // comment
         (?: [^\\] | \\ \n? )*?        # Possible line-continuation
         \n                            # End // comment
    )
 |  
    (?:                           # Non - comments 
         "
         (?: \\ [\S\s] | [^"\\] )*     # Double quoted text
         "
      |  '
         (?: \\ [\S\s] | [^'\\] )*     # Single quoted text
         ' 
      |  [\S\s]                        # Any other char
         [^/"'\\]*                     # Chars which doesn't start a comment, string, escape,
                                       # or line continuation (escape + newline)
    )

Perl测试用例

$/ = undef;
$str = <DATA>;

while ($str =~ /(?:(\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)|\/\/(?:[^\\]|\\\n?)*?\n)|(?:"(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^\/"'\\]*)/g)
{
      if (length ($1) > 0)
      { print "'$1'\n";}
}

__DATA__

/** 
* Comment section for Asset Record config OnStatusChange
*
* Updated for HT342408  Set Assetmeters to inactive when the asset they are
* associated with is retired. This will also cause the condition monitoring
* point associated with the meter to be displayed as inactive.
*/


if (ASSET.retired_date.isnull){
  ASSET.retired_date = new Date();
}

var meterset = ASSET.ASSETMETER;
for (var x = 0; x < meterset.length; x++){
//println('*********meterset['+ x + '].assetmeterid' + meterset[x].assetmeterid);
meterset[x].active = false; 
//println('*********meterset['+ x + '].active' + meterset[x].active);
}

输出&gt;&gt;

'/**
* Comment section for Asset Record config OnStatusChange
*
* Updated for HT342408  Set Assetmeters to inactive when the asset they are
* associated with is retired. This will also cause the condition monitoring
* point associated with the meter to be displayed as inactive.
*/'

答案 5 :(得分:0)

我创建了一个名为FilteredLineIterator的类,可以按照您的意愿执行。

FilteredLineIterator过滤另一个字符串迭代器(通常是文本文件中的行),根据存在的实体保留或丢弃每一行:&#34; blocks&#34;,&#34 ;单行&#34;和&#34;隐形阻挡&#34;实体。每条保留的线都可以改变。

FilteredLineIteratorXBN-Java的一部分。下载必要的广告here。)

下面的示例演示了FilteredLineIterator,其中保留了所有行,但只更改了JavaDoc块中的行(特别是在其#34; mid&#34;行中)。更改是正则表达式替换,它消除了任何前导星号 - 这大致是javadoc.exe的行为

(我已经在您输入的开头添加了一个额外的&#34;普通&#34; [非JavaDoc]多行注释块,因为它可能在那里真实源文件中至少有一个正常的块。由于&#34; normal&#34;和JavaDoc注释块都以&#34; * /&#34;结尾,这演示了如何防止&#34中的结束行34;正常&#34;多行注释来自在块打开之前发现的错误&#34;结束行&#34;错误。这是通过"stealth block"完成的,其唯一目的是防止误报,例如。)

   import  com.github.xbn.linefilter.FilteredLineIterator;
   import  com.github.xbn.linefilter.KeepUnmatched;
   import  com.github.xbn.linefilter.Returns;
   import  com.github.xbn.linefilter.alter.NewTextLineAltererFor;
   import  com.github.xbn.linefilter.alter.TextLineAlterer;
   import  com.github.xbn.linefilter.entity.BlockEntity;
   import  com.github.xbn.linefilter.entity.EntityRequired;
   import  com.github.xbn.linefilter.entity.KeepMatched;
   import  com.github.xbn.linefilter.entity.NewBlockEntityFor;
   import  com.github.xbn.linefilter.entity.NewStealthBlockEntityFor;
   import  com.github.xbn.linefilter.entity.StealthBlockEntity;
   import  com.github.xbn.regexutil.ReplacedInEachInput;
   import  com.github.xbn.testdev.GetFromCommandLineAtIndex;
   import  com.github.xbn.util.IncludeJavaDoc;
   import  java.util.Iterator;
   import  java.util.regex.Pattern;
/**
   <P>{@code java StripOptionalAsterisksFromJDLineStarts C:\java_code\example_input\JavaSnippetWithJDBlockAsterisksEachLine_input.txt}</P>
 **/
public class StripOptionalAsterisksFromJDLineStarts  {
   public static final void main(String[] cmd_lineParams)  {
      //Example setup:
         Iterator<String> itr = GetFromCommandLineAtIndex.fileLineIterator(
            cmd_lineParams, 0,
            null);   //debugPath

正确的例子:

      StealthBlockEntity javaMlcBlock = NewStealthBlockEntityFor.javaComment(
         "comment", IncludeJavaDoc.NO,
         null,       //dbgStart
         null,       //dbgEnd
         KeepMatched.YES, EntityRequired.YES, null,
         null);      //dbgLineNums

      TextLineAlterer stripAsterisks = NewTextLineAltererFor.replacement(
         Pattern.compile("[ \t]*\\*(.*)"), "$1",
         ReplacedInEachInput.FIRST,
         null,       //debug
         null);

      BlockEntity javaDocBlock = NewBlockEntityFor.javaDocComment_Cfg(
         "doccomment",
         null,       //dbgStart
         null,       //dbgEnd
         EntityRequired.YES, null,
         null).      //dbgLineNums
         midAlter(stripAsterisks).
         keepAll().build();

      FilteredLineIterator filteredItr = new FilteredLineIterator(
         itr, Returns.KEPT, KeepUnmatched.YES,
         null, null,    //dbgEveryLine and its line-range
         javaMlcBlock, javaDocBlock);

      while(filteredItr.hasNext())  {
         System.out.println(filteredItr.next());
      }
   }
}

输出:

/*
   A non-JavaDoc multi-line comment to emphasize the need for stealth-blocks
*/
/**
 Comment section for Asset Record config OnStatusChange

 Updated for HT342408  Set Assetmeters to inactive when the asset they are
 associated with is retired. This will also cause the condition monitoring
 point associated with the meter to be displayed as inactive.
*/


if (ASSET.retired_date.isnull){
  ASSET.retired_date = new Date();
}

var meterset = ASSET.ASSETMETER;
for (var x = 0; x < meterset.length; x++){
//println('*********meterset['+ x + '].assetmeterid' + meterset[x].assetmeterid);
meterset[x].active = false;
//println('*********meterset['+ x + '].active' + meterset[x].active);
}