赢取批处理正则表达式搜索和替换

时间:2013-02-13 14:40:23

标签: windows batch-file

我有一组像这样的数据

  

7859 10000:00 7859 10000:00(xfer#1,to-check = 1033/1035)

     

32768 000:17 22174479 10000:00(xfer#2,to-check = 1032/1035)

从文件中读取它们并逐行传递到我的批处理脚本中的方法 我想用这种方法做的只是提取

  

7859

     

22174479

从这一行开始,基本上是在“\ d +:\ d \ d \ s +”之后的任何内容,接下来是我需要的数字,然后是另一个“\ d \ d。*”

这是否可以仅使用批处理脚本正则表达式并搜索和替换? 我尝试阅读了一堆文章,但找不到解决方案 在和我想添加数字

谢谢

修改
根据Andrei对David Ruhmann的回答的评论,Andrei想要(xfer#之前2个位置的令牌,而不是从头开始的第3个令牌。

4 个答案:

答案 0 :(得分:1)

  :: Does %variable% =~ s/old/new/
  setlocal ENABLEDELAYEDEXPANSION     
  for /f "delims=" %%a in ('echo !variable! ^|perl -pe "s/regexp/replace/" ') do set variable=%%a  

答案 1 :(得分:0)

请注意,批处理不是用于正则表达式的最佳语言! Cmd一次处理输入一行,而正则表达式允许多行处理。

听起来你只需要从线上执行令牌抓取。假设该行的正则表达式更完整,如[\d+\s+\d+:\d\d\s+]+\(xfer#\d+, to-check=\d+/\d+\)

这使我们知道行中有常量分隔符。 :冒号和\s+空格。从那里开始,只需使用这些锚点来确定令牌位置。


从行中提取由单行空格分隔的第三个标记。

for /f "tokens=3" %%A in ("line") do echo %%A

从第一个由冒号分隔的第二个标记中提取由单行空格分隔的第二个标记。

for /f "tokens=2 delims=:" %%A in ("line") do (
    for /f "tokens=2" %%B in ("%%A") do echo %%B
)

<强>更新

在最后一个冒号之前提取第二个标记。

@echo off
setlocal EnableExtensions EnableDelayedExpansion
set "Line=32768 004:47 2686976 2200:03 11707819 10000:01 (xfer#5264, to-check=1020/6975)"

set "Last="
for /f "delims=" %%A in ('echo("%Line::="^&echo("%"') do (
    for /f "tokens=2" %%B in ("%%A") do (
        if defined This set "Last=!This!"
        set "This=%%B"
    )
)
echo %Last%

endlocal
pause >nul

<强>限制

  1. 包含奇数个双引号"的行将导致脚本崩溃。防止这种情况的一种方法是使用set Line=%Line:"=%去除for循环之前的引号。

答案 2 :(得分:0)

实现目标的最简单,最灵活的方法是使用GnuWin32中的awkregexp examples)或sed(例如:sed -i -r -e "s/(\d+:\d\d\s+)\d+/\1replacementstring/g" filename)两者都支持Perl regexp语法。我认为你所参与的正是awk的设计目标。

如果您在不使用第三方工具的情况下仅使用可用的内容,则可以使用vbscript执行正则表达式匹配。您可以通过将脚本回显到.vbs文件,调用cscript vbsfile并捕获其输出来调用vbscript。这是一个概念证明。

@echo off & setlocal enabledelayedexpansion

:: rxp.bat
:: rxp /? for usage instructions

if #%4==# goto usage
set global=false
set replace=false
for %%I in (%*) do (
    if not #!next!==# (
        if !next!==string set string=%%I
        if !next!==pattern set pattern=%%I
        if !next!==replace set replace=%%I
        set next=
    )
    if #%%I==#/s set next=string
    if #%%I==#/p set next=pattern
    if #%%I==#/r set next=replace
    if #%%I==#/g set global=true
)
if #%string==# goto usage
if #%pattern==# goto usage

set string=!string:"=""!
set string=!string:\=!
set pattern=!pattern:"=""!
set pattern=!pattern:\=!
if #!replace!==#false (
    call :rxp !string:~1,-1! !pattern:~1,-1! !global!
) else (
    set replace=!replace:"=""!
    set replace=!replace:\=!
    call :rxp !string:~1,-1! !pattern:~1,-1! !global! !replace:~1,-1!
)
goto :EOF

:rxp string pattern global replacement
echo Set rxp = New RegExp>regexp.vbs
echo rxp.Pattern = %2>>regexp.vbs
echo rxp.Global = %3>>regexp.vbs
if #%4==# (
    echo Set res = rxp.Execute^(%1^)>>regexp.vbs
    echo For Each match in res>>regexp.vbs
    echo Wscript.Echo match.value>>regexp.vbs
    echo Next>>regexp.vbs
) else (
    echo Wscript.echo rxp.Replace^(%1, %4^)>>regexp.vbs
)
cscript /nologo regexp.vbs
del /q regexp.vbs
goto :EOF

:usage
echo Usage: %~nx0 /s "string" /p "regexp" [/g] [/r "replacement text"]
echo;
echo    /s -- search string
echo;
echo    /p -- regular expression pattern
echo          Example: /p "<[^>]+>" to search for markup tags
echo          matches ^<span class='a'^> or similar
echo;
echo    /r -- replacement text (optional)
echo          If specified, replace the matched text
echo          Example: /p "(<div class=')blue('>)" /r "$1red$2"
echo          matches ^<div class='blue'^>
echo          replaces match with ^<div class='red'^>
echo;
echo    /g -- global match (optional)
echo          match every occurrence (matches only the first by default)
echo;
echo notes: If the regexp pattern includes capturing parentheses, use ^$1-^$9 as
echo backreferences in your replacement text.  If any of your strings include
echo quotation marks, they can be escaped with a backslash (\).
echo;
echo Example:
echo %~nx0 /s "text begin <div id=\"foo\"> text end" /p "(<div)[^>]+(>)"
echo /r "$1 class=\"bar\"$2"
echo;
echo matches ^<div id="foo"^>, replaces match with ^<div class="bar"^>
echo output: text begin ^<div class="bar"^> text end

示例输出:

C:\Users\me\Desktop>rxp /s "7859 10000:00 7849 10000:00 (xfer#1, to-check=1033/1035)" /p "(\d+:\d\d\s+)\d+" /r "$1foo"
7859 10000:00 foo 10000:00 (xfer#1, to-check=1033/1035)

C:\Users\me\Desktop>rxp
Usage: rxp.bat /s "string" /p "regexp" [/g] [/r "replacement text"]

   /s -- search string

   /p -- regular expression pattern
         Example: /p "<[^>]+>" to search for markup tags
         matches <span class='a'> or similar

   /r -- replacement text (optional)
         If specified, replace the matched text

   /g -- global match (optional)
         match every occurrence (matches only the first by default)

notes: If the regexp pattern includes capturing parentheses, use $1-$9 as
backreferences in your replacement text.  If any of your strings include
quotation marks, they can be escaped with a backslash (\).

Example:
rxp.bat /s "text begin <div id=\"foo\"> text end" /p "(<div)[^>]+(>)"
/r "$1 class=\"bar\"$2"

matches <div id="foo">, replaces match with <div class="bar">
output: text begin <div class="bar"> text end

答案 3 :(得分:0)

根据您对David Ruhmann的回答的评论,您需要在(xfer#字符串之前的2个位置的令牌。我想可以使用本机批处理命令来完成,但这是一个令人讨厌的问题。

我假设您只能使用Windows原生的命令 - 没有下载的可执行文件。

我希望您可以使用JScript,因为它是Windows原生的。

我编写了一个名为“REPL.BAT”的混合JScript / Batch实用程序脚本,它执行正则表达式搜索和替换。尽管不需要太多代码,但这是一个非常有用的实用工具。该实用程序使解决方案非常简单。

我使用FINDSTR过滤掉不符合(xfer#之前至少2个空格分隔标记模板的行。我将这些结果传递给我的REPL实用程序并仅保留所需的令牌。结果将发送到stdout。

findstr /r /c:" [^ ][^ ]* [^ ][^ ]* (xfer#" test.txt | repl ".* ([^ ]+) ([^ ]+) \(xfer#.*" "$1"

以下是REPL.BAT实用程序脚本的代码。完整的文档嵌入在脚本中。

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?
:::
:::  Performs a global search and replace operation on each line of input from
:::  stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /? then prints help documentation
:::  to stdout.
:::
:::  Search  - By default this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::            A $ literal can be escaped as $$. An empty replacement string
:::            must be represented as "".
:::
:::            Replace substitution pattern syntax is documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            E - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line. ^ anchors
:::                the beginning of a line and $ anchors the end of a line.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Ascii (Latin 1) character expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Escape sequences are supported even when the L option is used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string.
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 1
  )
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
  call :err "Invalid option(s)"
  exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
  options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
  options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
  options=options.replace(/e/g,"");
  search=env(search);
  replace=env(replace);
}
if (options.indexOf("l")>=0) {
  options=options.replace(/l/g,"");
  search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
  replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
  options=options.replace(/x/g,"");
  replace=replace.replace(/\\\\/g,"\\B");
  replace=replace.replace(/\\b/g,"\b");
  replace=replace.replace(/\\f/g,"\f");
  replace=replace.replace(/\\n/g,"\n");
  replace=replace.replace(/\\r/g,"\r");
  replace=replace.replace(/\\t/g,"\t");
  replace=replace.replace(/\\v/g,"\v");
  replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
    function($0,$1,$2){
      return String.fromCharCode(parseInt("0x"+$0.substring(2)));
    }
  );
  replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);

if (srcVar) {
  WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
  while (!WScript.StdIn.AtEndOfStream) {
    if (multi) {
      WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
    } else {
      WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
    }
  }
}