在文本文件中查找具有特殊字符的字符串,并在每次出现前添加换行符

时间:2015-12-07 08:19:41

标签: string batch-file replace

我有一个文本文件,它是一个长字符串,如下所示:

ISA*00*GARBAGE~ST*TEST*TEST~CLP*TEST~ST*TEST*TEST~CLP*TEST~ST*TEST*TEST~CLP*TEST~GE*GARBAGE*~   

我需要它看起来像这样:

~ST*TEST*TEST~CLP*TEST
~ST*TEST*TEST~CLP*TEST
~ST*TEST*TEST~CLP*TEST

我首先尝试在每个~ST添加一行来分割字符串,但我不能为我的生活做出这样的事情。我尝试了各种脚本,但我认为查找/替换脚本效果最好。

@echo off
setlocal enabledelayedexpansion
set INTEXTFILE=test.txt
set OUTTEXTFILE=test_out.txt
set SEARCHTEXT=~ST
set REPLACETEXT=~ST

for /f "tokens=1,* delims=~" %%A in ( '"type %INTEXTFILE%"') do (
    SET string=%%A
    SET modified=!string:%SEARCHTEXT%=%REPLACETEXT%!

    echo !modified! >> %OUTTEXTFILE%
)
del %INTEXTFILE%
rename %OUTTEXTFILE% %INTEXTFILE%

在此处找到How to replace substrings in windows batch file

但是我被卡住了因为(1)特殊字符~使代码根本不起作用。它给了我这个结果:

string:~ST=~ST

如果在"~ST"周围使用引号,则代码根本不执行任何操作。 (2)我无法弄清楚如何在~ST之前添加换行符。

最后一项任务是在执行所有拆分后删除ISA*00*blahblahblah~GE*blahblahblah行。但我仍然坚持~ST部分的分裂。

有什么建议吗?

3 个答案:

答案 0 :(得分:3)

@echo off
setlocal EnableDelayedExpansion

rem Set next variable to the number of "~" chars that delimit the wanted fields, or more
set "maxTokens=7"
rem Define the delimiters that starts a new field
set "delims=/ST/GE/"

for /F "delims=" %%a in (test.txt) do (
   set "line=%%a"
   set "field="
   rem Process up to maxTokens per line;
   rem this is a trick to avoid a call to a subroutine that have a goto loop
   for /L %%i in (0,1,%maxTokens%) do if defined line (
      for /F "tokens=1* delims=~" %%b in ("!line!") do (
         rem Get the first token in the line separated by "~" delimiter
         set "token=%%b"
         rem ... and update the rest of the line
         set "line=%%c"
         rem Get the first two chars after "~" token like "ST", "CL" or "GE";
         rem                            if they are "ST" or "GE":
         for %%d in ("!token:~0,2!") do if "!delims:/%%~d/=!" neq "%delims%" (
            rem Start a new field: show previous one, if any
            if defined field echo !field!
            if "%%~d" equ "ST" (
               set "field=~%%b"
            ) else (
               rem It is "GE": cancel rest of line
               set "line="
            )
         ) else (
            rem It is "CL" token: join it to current field, if any
            if defined field set "field=!field!~%%b"
         )
      )
   )
)

输入:

ISA*00*GARBAGE~ST*TEST1*TEST1~CLP*TEST1~ST*TEST2*TEST2~CLP*TEST2~ST*TEST3*TEST3~CLP*TEST3~GE*GARBAGE*~CLP~TESTX

输出:

~ST*TEST1*TEST1~CLP*TEST1
~ST*TEST2*TEST2~CLP*TEST2
~ST*TEST3*TEST3~CLP*TEST3

答案 1 :(得分:0)

~不能用作子字符串替换语法%VARIABLE:SEARCH_STRING=REPLACE_STRING%中搜索字符串的第一个字符,因为它用于标记子字符串扩展%VARIABLE:~POSITION,LENGTH%(类型{{1更多信息)。

假设您的文本文件仅包含一行文本且不超过大约8 KB的大小,我会看到以下选项来完成您的任务。该脚本使用子字符串替换语法set/?; %VARIABLE:*SEARCH_STRING=REPLACE_STRING%定义匹配第一次出现*的所有内容:

SEARCH_STRING

以下限制适用于此方法:

  • 输入文件包含一行;
  • 输入文件的大小不超过大约8 kBytes;
  • 只有@echo off setlocal EnableExtensions EnableDelayedExpansion rem initialise constants: set "INFILE=test_in.txt" set "OUTFILE=test_out.txt" set "SEARCH=ST" set "TAIL=GE" rem read single-line file content into variable: < "%INFILE%" set /P "DATA=" rem remove everything before first `~%SEARCH%`: set "DATA=~%SEARCH%!DATA:*~%SEARCH%=!" rem call sub-routine, redirect its output: > "%OUTFILE%" call :LOOP endlocal goto :EOF :LOOP rem extract portion right to first `~%SEARCH%`: set "RIGHT=!DATA:*~%SEARCH%=!" rem skip rest if no match found: if "!RIGHT!"=="!DATA!" goto :TAIL rem extract portion left to first `~%SEARCH%`, including `~`: set "LEFT=!DATA:%SEARCH%%RIGHT%=!" rem the last character must be a `~`; rem so remove it; `echo` outputs a trailing line-break; rem the `if` avoids an empty line at the beginning; rem the unwanted part at the beginning is removed implicitly: if not "!LEFT:~,-1!"=="" echo(!LEFT:~,-1! rem output `~%SEARCH%` without trailing line-break: < nul set /P "DUMMY=~%SEARCH%" rem store remainder for next iteration: set "DATA=!RIGHT!" rem loop back if remainder is not empty: if not "!DATA!"=="" goto :LOOP :TAIL rem this section removes the part starting at `~%TAIL%`: set "RIGHT=!DATA:*~%TAIL%=!" if "!RIGHT!"=="!DATA!" goto :EOF set "LEFT=!DATA:%TAIL%%RIGHT%=!" rem output part before `~%TAIL%` without trailing line-break: < nul set /P "DUMMY=!LEFT:~,-1!" goto :EOF 的一个实例,发生在~GE的所有实例之后;
  • 两个相邻~ST个实例之间始终至少有一个字符;
  • 文件中不会出现特殊字符,例如: SPACE TAB ~ST"%,{ {1}};

答案 2 :(得分:0)

不要重新发明轮子,使用正则表达式替换工具,例如sedJREPL.BAT

call jrepl "^.*?~ST(.+?)~GE.*$" "'~ST'+$1.replace(/~ST/g,'\r\n$&')" /jmatch <in.txt >out.txt