使用Ant,替换文件中的所有_但仅在特定字符串中?

时间:2014-07-31 15:28:48

标签: regex ant

这看起来应该很简单:使用Ant任务,我可以使用replaceregexp替换所有特定重复字符,仅在文件中的某些字符串中吗?

文件内容:

Blah blah blah <ac:link> words_with_underscores_to_turn_to_spaces</link>
Blah blah blah Blah blah blah Blah blah blah Blah blah blah
Words_with_underscores_that_I_want_to_keep. Blah blah blah Blah blah blah. 

想要的结果是:

Blah blah blah <ac:link> words with underscores to turn to spaces</link> 
Blah blah blah Blah blah blah Blah blah blah Blah blah blah 
Words_with_underscores_that_I_want_to_keep. Blah blah blah Blah blah blah. 

我可以使用replaceregexp来匹配&lt;ac:link.*?/link&gt;并将替换限制在这些字符串中,但在这种情况下,我如何告诉它替换它在该字符串中找到的所有下划线,无论它们落在何处?带下划线的行并不总是相同的单词数。

我也尝试了一种复制任务方法,如下所示:

  <copy todir=".\test_output">
   <filterchain>
   <tokenfilter>
     <containsregex pattern="(ac:link.*?link)" flags="gi"/>
    <replacestring from="_" to=" "/>
   </tokenfilter>
  </filterchain>
  <fileset dir=".\underscore_test_output" includes="**/*.txt"/>
 </copy>

用链接中的空格替换下划线并将链接移动到新文件中,但它排除了源文件的其余部分,因为我只匹配链接。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

使用<scriptfilter>是在<filterchain>中使用条件逻辑的绝佳方法。

在下面的脚本中,<filetokenizer/>将整个输入文件视为单个标记。这允许JavaScript跨换行匹配标记。

Ant脚本

<copy todir="${out.dir}">
  <fileset dir="${basedir}" includes="test.txt"/>
  <filterchain>
    <tokenfilter>
      <filetokenizer/>
      <scriptfilter language="javascript"><![CDATA[
        var originalFile = self.getToken();
        var originalFileIndex = 0;
        var transformedFile = '';
        var keepGoing = true;

        // The "ac:" vs no "ac:" discrepency between the opening and closing 
        // tags is in the sample text from the question.
        var openingTagFormat = '<ac:link>';
        var closingTagFormat = '</link>';

        while (keepGoing) {
          var openingAcLinkBeginIndex = originalFile.indexOf(openingTagFormat, originalFileIndex);
          keepGoing = openingAcLinkBeginIndex > -1;
          if (keepGoing) {
            var openingAcLinkEndIndex = openingAcLinkBeginIndex + openingTagFormat.length;
            var closingAcLinkBeginIndex = originalFile.indexOf(closingTagFormat, openingAcLinkEndIndex);
            keepGoing = closingAcLinkBeginIndex > -1;
            if (keepGoing) {
              transformedFile += originalFile.slice(originalFileIndex, openingAcLinkEndIndex);
              var closingAcLinkEndIndex = closingAcLinkBeginIndex + closingTagFormat.length;
              var stringBetweenAcLinkTags = originalFile.slice(openingAcLinkEndIndex, closingAcLinkBeginIndex);
              transformedFile += stringBetweenAcLinkTags.replace(/_/g, ' ');
              transformedFile += originalFile.slice(closingAcLinkBeginIndex, closingAcLinkEndIndex);
              originalFileIndex = closingAcLinkEndIndex;
            }
          }
        }

        transformedFile += originalFile.substring(originalFileIndex);

        self.setToken(transformedFile);
      ]]></scriptfilter>
    </tokenfilter>
  </filterchain>
</copy>

输出

Blah blah blah <ac:link> words with underscores to turn to spaces</link>
Blah blah blah Blah blah blah Blah blah blah Blah blah blah
Words_with_underscores_that_I_want_to_keep. Blah blah blah Blah blah blah.