真的合并.bat文件

时间:2012-07-02 10:29:28

标签: batch-file

如何在.bat文件中合并两个文本文件? 或者至少如何读取.bat文件中文件的下一行/测试结束?

是否可以使用.bat脚本合并两个文本文件? 这个想法不是追加或连接,而是根据每一行的内容执行合并操作。 一个简单的例子是从两个已排序的文件中生成一个排序文件,比如伪代码(伪似乎我似乎无法找到读取下一行的方法并测试结束文件读取 - 在for循环之外)

:TOP
 Set /p  Line1 Read_Line (file1)
:set /p  Line2 Read_Line (file2)
:TEST
 IF EOF(file1) GOTO  FINISH2
 IF EOF (file2) GOTO FINISH1
 IF  %Line1%  < %Line2% 
        (echo %Line1% - not in 2 >> File3
        set  /p Line1 =Read_Line (file1)
        GOTO TEST)
ELSE IF %Line1%  > %Line2% 
        (echo %Line2% - not in 1>> File3
        set  /p Line2=Read_Line (file2)
        GOTO TEST)
ELSE echo %Line1% in both >> File3
GOTO TOP
:FINISH1
echo %Line2% - not in 1>> File3
        set /p Line1=Read_Line (file1)
        IF NOT (EOF (File1)) 
                (echo %Line1% - not in 2 >> File3
                 GOTO FINISH1)
ELSE GOTO EOF
:FINISH2
           echo %Line2% - not in 1>> File3
        set /p Line2 =Read_Line (file2)
        IF NOT (EOF (File1) )
                (echo %Line2% - not in 1 >> File3
                 GOTO FINISH2)

我尝试使用for循环,但是内部循环的分支似乎会停止循环。我尝试了各种各样的东西(包括一个并行的.bat)来找到一种方法,使用set<将光标移动到文件中,但无法找到如何正确执行。

2 个答案:

答案 0 :(得分:2)

Batch实际上是一种用于文本处理的糟糕“语言”。几乎任何其他可以找到的工具都比批处理更好(开发更容易,执行更快)。我提供批量解决方案,因为我喜欢挑战,但我总是会推荐一些其他语言或工具批量进行文本处理。那就是说......

假设两个源文件都已经排序。

@echo off
setlocal enableDelayedExpansion

::define the files
set "in1=file1.txt"
set "in2=file2.txt"
set "out=file3.txt"

::define some simple macros
set "eof1=^!ln1^! gtr ^!cnt1^!"
set "eof2=^!ln2^! gtr ^!cnt2^!"
set "read1=if ^!ln1^! leq ^!cnt1^! set "txt1=" & <&3 set /p "txt1=" & set /a ln1+=1"
set "read2=if ^!ln2^! leq ^!cnt2^! set "txt2=" & <&4 set /p "txt2=" & set /a ln2+=1"
set "write1=echo(^!txt1^! - not in 2"
set "write2=echo(^!txt2^! - not in 1"
set "writeBoth=echo(^!txt1^! - in both"

::count the number of lines in each file
for /f %%N in ('find /v /c "" ^<"%in1%"') do set "cnt1=%%N"
for /f %%N in ('find /v /c "" ^<"%in2%"') do set "cnt2=%%N"

::setup redirection in outer block and merge the files in a loop
::The max number of iterations assumes there is no overlap (cnt1+cnt2)
::Break out of the loop as soon as both files have reached EOF.
set /a ln1=0, ln2=0, cnt=cnt1+cnt2
4<"%in2%" 3<"%in1%" (
  %read1%
  %read2%
  for /l %%N in (1 1 %cnt%) do (
    if %eof1% (
        if %eof2% goto :break
        %write2%
        %read2%
    ) else if %eof2% (
        %write1%
        %read1%
    ) else if .!txt1! lss .!txt2! (
        %write1%
        %read1%
    ) else if .!txt2! lss .!txt1! (
        %write2%
        %read2%
    ) else (
        %writeBoth%
        %read1%
        %read2%
    )
  )
) >"%out%
:break

使用SET / P读取文件有以下限制:

  • 两个文件中的行必须以<carriage return><line feed>个字符(Windows样式)终止。它不适用于以<line feed>(Unix样式)终止的行。
  • 每行最多1021个字节(字符),不包括行终止符
  • 将从每一行删除尾随控制字符。

修改

如果您只是想创建一个没有重复项的已排序合并文档,那么我认为以下是sean's approach的优化版本。它并不像他那么优雅,但我相信它要快得多。通过将EOL选项设置为<line feed>,它还允许每一行以任何字符开头。请注意,此解决方案从输出中删除所有空行(与sean一样)。可以添加其他代码以保留一个空行。

@echo off
setlocal disableDelayedExpansion
set lf=^


::above 2 blank lines required
copy /b file1.txt+file2.txt file3.txt >nul
set "old="
(
  for /f eol^=^%lf%%lf%^ delims^= %%A in ('sort file3.txt') do (
    set "new=.%%A"
    setlocal enableDelayedExpansion
    if "!old!" neq "!new!" echo(!new:~1!
    for /f "delims=" %%B in ("!new!") do (
      endlocal
      set "old=%%B"
    )
  )
)>file4.txt

答案 1 :(得分:1)

2个步骤(不需要排序,因为步骤2中的find检查新文件,并且只有在找不到数据时才写入内容):

  1. 合并文件:
    copy file1.txt+file2.txt file3.txt

  2. 删除重复的行(/i忽略大小写,如果FredFRED被视为不同则省略}:

    @echo off
    for /f "tokens=* delims=" %%a in (file3.txt) do (
      find /i "%%a" file4.txt>>nul&&rem
      if errorlevel 1 echo %%a>>file4.txt
      ) 
    
  3. 结果文件为file4.txt