如果两个连续的行几乎相同,则拆分文本文件

时间:2015-06-03 19:54:35

标签: windows batch-file command-line text-files

我需要根据前一行(从第2位到第13位)的字符串内容和当前行的字符串内容(从第2位到第2位)拆分文本文件(使用.bat命令) 13)...

我解释说:

我的文件看起来像这样:

IA1234567890A         XX33              AZE
bla1                  XX34              DES
bla2                  XX34              DES
bla3                  XX34              DES
FA1234567890A         XX35              AZE
IA1234567890A         XX36              AZE
bla4                  XX34              DES
bla5                  XX34              DES
bla6                  XX34              DES
FA1234567890A         XX37              AZE
IB0987654321A         XX38              AZE
bla7                  XX34              DES
bla8                  XX34              DES
bla9                  XX34              DES
FB0987654321A         XX39              AZE

我希望在以“I”开头的一行的前12个字符(不考虑“I”)与前一行的前12个字符(始终以a开头)分割文件“F”除了第一行,但比较不应该考虑“F”)。

所以我不会在这两行之间拆分文件:

FA1234567890A         XX35              AZE
IA1234567890A         XX36              AZE

但我会在这两行之间拆分文件:

FA1234567890A         XX37              AZE
IB0987654321A         XX38              AZE

我知道如何使用分隔符拆分文件,但我完全迷失了这个比较的东西......

如果你们中的一个人能帮我解决这个棘手的问题,我真的很感激......

谢谢!

3 个答案:

答案 0 :(得分:1)

此内容来自data.txt并创建output1.txtoutput2.txt,... outputn.txt

@echo off
setlocal enabledelayedexpansion

set outputcount=0
set previousblock=

for /f "delims=" %%s in (data.txt) do (
  set line=%%s
  set currentblock=!line:~1,13!

  if "!line:~0,1!" EQU "I" (
    if "!previousblock!" NEQ "!currentblock!" (
        set /A outputcount=!outputcount!+1
    )
  )

  echo !line!>>output!outputcount!.txt
  set previousblock=!currentblock!    
)

e.g。

D:\scripts>splitfile.bat
D:\scripts>type output*

output1.txt


IA1234567890A         XX33              AZE
bla1                  XX34              DES
bla2                  XX34              DES
bla3                  XX34              DES
FA1234567890A         XX35              AZE
IA1234567890A         XX36              AZE
bla4                  XX34              DES
bla5                  XX34              DES
bla6                  XX34              DES
FA1234567890A         XX37              AZE

output2.txt


IB0987654321A         XX38              AZE
bla7                  XX34              DES
bla8                  XX34              DES
bla9                  XX34              DES
FB0987654321A         XX39              AZE

修改

更新代码以使其正常工作。

答案 1 :(得分:0)

试试这个:

#!/bin/sh

## clean any split files (got created in previous runs)
rm split.*;

## define variables, ct=counter for reading next line, cnt=counter for creating split.X file and file=split filename
ct=2
cnt=1
file="split.$cnt";

## Read line with spaces, IFS=''
IFS=''
while read lineP
do
  ## Read next line and increment ct variable
  lineN="$(sed -n "${ct}p" inputfile.txt)" && ((ct++))

  ## Read first character of two lines and the next 12 characters
  lineP121=${lineP:0:1} && lineN121=${lineN:0:1}
  lineP1212=${lineP:1:12} && lineN1212=${lineN:1:12}

  ## Match / Condition
  if [[ "$lineP1212" != "$lineN1212" && ( "$lineP121" == "F" && "$lineN121" == "I" ) ]];
  then
   echo "${lineP}:" >> $file;
   ((++cnt));
   file="split.$cnt";
  else
   echo -e "$lineP\n" >> $file;
  fi
done < inputfile.txt

echo -e "\n\nFile created are (with contents in split.X files):\n\n"
ls -l split.* && echo && grep -n . split.* && echo

输出是:创建的文件数2 split.1和split.2文件(根据输入文件)。

File created are (with contents in split.X files. Output generated by grep -n command. You can use simple cat command if you want):


-rw-r--r-- 1 koba loki 450 Jun  3 19:01 split.1
-rw-r--r-- 1 koba loki 225 Jun  3 19:01 split.2

split.1:1:IA1234567890A         XX33              AZE
split.1:3:bla1                  XX34              DES
split.1:5:bla2                  XX34              DES
split.1:7:bla3                  XX34              DES
split.1:9:FA1234567890A         XX35              AZE
split.1:11:IA1234567890A         XX36              AZE
split.1:13:bla4                  XX34              DES
split.1:15:bla5                  XX34              DES
split.1:17:bla6                  XX34              DES
split.1:19:FA1234567890A         XX37              AZE:

split.2:1:IB0987654321A         XX38              AZE
split.2:3:bla7                  XX34              DES
split.2:5:bla8                  XX34              DES
split.2:7:bla9                  XX34              DES
split.2:9:FB0987654321A         XX39              AZE

答案 2 :(得分:0)

如果输入文件很大,此方法应该运行得更快,因为它不会检查所有行。它还可以正确处理具有特殊批处理字符的行。

@echo off
setlocal EnableDelayedExpansion

rem Read the first line, and create a dummy previous "endLine" with same name
set /P "endName=" < test.txt
set "endName=F%endName:~1%"
set startLine=1
set "startName="

rem Redirect the input file to a code block, in order to read it
< test.txt (

   rem Locate all lines that start with "I" or "F"
   for /F "tokens=1,2 delims=: " %%a in ('findstr /N /B "I F" test.txt') do (
      if not defined startName (
         set "startName=%%b"
         if "!startName:~1,12!" neq "!endName:~1,12!" (
            rem New section starts: copy it to its own file
            set /A lines=endLine-startLine+1
            (for /L %%i in (1,1,!lines!) do (
               set /P "line="
               echo !line!
            )) > "Part !endName:~1,12!.txt"
            set "endName=F%startName:~1%"
            set "startLine=%%a"
         )
      ) else (
         set "endLine=%%a"
         set "endName=%%b"
         set "startName="
      )
   )

   rem Copy last section to its own file
   findstr "^" > "Part !endName:~1,12!.txt"
)

输出:

C:\> type Part*.txt

Part A1234567890A.txt


IA1234567890A         XX33              AZE
bla1                  XX34              DES
bla2                  XX34              DES
bla3                  XX34              DES
FA1234567890A         XX35              AZE
IA1234567890A         XX36              AZE
bla4                  XX34              DES
bla5                  XX34              DES
bla6                  XX34              DES
FA1234567890A         XX37              AZE

Part B0987654321A.txt


IB0987654321A         XX38              AZE
bla7                  XX34              DES
bla8                  XX34              DES
bla9                  XX34              DES
FB0987654321A         XX39              AZE