当某些字段为空时,Windows批处理文件如何从分隔的文本文件中正确读取数据?

时间:2015-01-07 23:59:38

标签: windows batch-file delimited-text

我有一个逗号分隔的文本文件,其中包含三个字段。第一个总是包含一个字符串,但第二个,第三个或两个都可以为空。当所有包含字符串时,只有第三个是emppty,当第二个和第三个都是空的时,我在使用FOR命令读取时获得预期结果,预期结果是从包含字符串的字段读取的变量相等对于这些字符串,从空字段读取的变量具有空值。但是,当第二个fielkd是空的,并且第三个字段包含一个字符串时,我得到意外的结果,第二个变量,应该从第二个字段读取的那个等于第三个字段的内容,第三个字段的内容变量具有空值。

我该如何解决这个问题?

2 个答案:

答案 0 :(得分:2)

此信息会从我的DosTips帖子中逐字复制:Safely parse nearly any CSV with parseCSV.bat

有人想要使用FOR / F解析CSV是很常见的。如果您知道所有列都已填充,并且值中没有逗号,换行符或引号,则这是一项简单的任务。假设有4列:

@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
  echo ----------------------
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo(
)

但如果出现以下任何一种情况,事情会变得更加困难:

1)值可能为空,带有连续逗号。 FOR / F将连续的分隔符视为一个,因此它会抛弃列赋值。

2)引用的值可能包含逗号。 FOR / F会错误地将带引号的逗号视为列分隔符。

3)引用的值可能包含换行符。 FOR / F将在换行符处断行并错误地将一行视为两行。

4)引用的值可能包含代表一个引用的成对引号。
例如,"He said, ""Hello there""。需要一种方法将""转换为"

如果启用延迟扩展,则会出现可能出现的次要问题。

5)FOR变量%% A如果包含!(或有时^),如果扩展变量时启用延迟扩展,则会损坏。

对于其中一些问题,有一些相当简单的解决方案,但是使用纯批次解决所有这些问题非常困难(而且速度很慢)。

我编写了一个名为parseCSV.bat的混合JScript /批处理实用程序,可以使用FOR / F正确解析几乎任何CSV文件。

<强> parseCSV.bat

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment

::************ Documentation ***********
::parseCSV.bat version 1.0
:::
:::parseCSV  [/option]...
:::
:::  Parse stdin as CSV and write it to stdout in a way that can be safely
:::  parsed by FOR /F. All columns will be enclosed by quotes so that empty
:::  columns may be preserved. It also supports delimiters, newlines, and
:::  quotes within quoted values. Two consecutive quotes within a quoted value
:::  are converted into one quote.
:::
:::  Available options:
:::
:::    /I:string = Input delimiter. Default is a comma.
:::
:::    /O:string = Output delimiter. Default is a comma.
:::
:::    /E = Encode output delimiter in value as \D
:::         Encode newline in value as \N
:::         Encode backslash in value as \S
:::
:::    /D = Escape exclamation point and caret for delayed expansion
:::         ! becomes ^!
:::         ^ becomes ^^
:::
:::parseCSV  /?
:::
:::  Display this help
:::
:::parseCSV  /V
:::
:::  Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::

::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
  setlocal disableDelayedExpansion
  for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
  exit /b 0
)
if /i "%~1" equ "/V" (
  for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
  exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0


************ JScript portion ***********/
var args     = WScript.Arguments.Named,
    stdin    = WScript.Stdin,
    stdout   = WScript.Stdout,
    escape   = args.Exists("E"),
    delayed  = args.Exists("D"),
    inDelim  = args.Exists("I") ? args.Item("I") : ",",
    outDelim = args.Exists("O") ? args.Item("O") : ",",
    quote    = false,
    ln, c, n;
while (!stdin.AtEndOfStream) {
  ln=stdin.ReadLine();
  if (!quote) stdout.Write('"');
  for (n=0; n<ln.length; n++ ) {
    c=ln.charAt(n);
    if (c == '"') {
      if (quote && ln.charAt(n+1) == '"') {
        n++;
      } else {
        quote=!quote;
        continue;
      }
    }
    if (c == inDelim && !quote) c='"'+outDelim+'"';
    if (escape) {
      if (c == outDelim) c="\\D";
      if (c == "\\") c="\\S";
    }
    if (delayed) {
      if (c == "!") c="^!";
      if (c == "^") c="^^";
    }
    stdout.Write(c);
  }
  stdout.Write( (quote) ? ((escape) ? "\\N" : "\n") : '"\n' );
}

我还编写了一个定义宏的脚本,以帮助解析最有问题的CSV文件。有关带参数的批处理宏的背景信息,请参阅http://www.dostips.com/forum/viewtopic.php?f=3&t=1827

<强> define_csvGetCol.bat

::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
::   @echo off
::   setlocal disableDelayedExpansion
::   call define_csvGetCol
::   setlocal enableDelayedExpansion
::   for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
::     %== Load and decode column values ==%
::     %csvGetCol% A "," %%A
::     %csvGetCol% B "," %%B
::     %csvGetCol% C "," %%C
::     %== Display the result ==%
::     echo ----------------------
::     for %%V in (A B C) do echo %%V=!%%V!
::     echo(
::   )
::
:: Written by Dave Benham
::

:: Delayed expansion must be disabled during macro definition

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

:: Define csvGetCol
:: %csvGetCol%  envVarName  "Delimiter"  FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
  endlocal^&endlocal%\n%
  set "%%1=%%~3"!%\n%
  if defined %%1 (%\n%
    for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
    set "%%1=!%%1:\D=%%~2!"%\n%
    set "%%1=!%%1:\S=\!"%\n%
  )%\n%
)) else setlocal disableDelayedExpansion ^& set args=


如果您知道任何值中没有逗号或换行符,则使用非常简单,并且不需要延迟扩展:

<强> test1.csv

"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

test1.bat - 没有延迟展开,没有逗号或值的换行符

@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

<强> - OUTPUT1 -

-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4

如果您知道任何值中不存在的字符,则逗号处于值中也很简单。只需为输出分隔符指定唯一字符。

<强> test2.csv

"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

test2.bat - 没有延迟扩展,没有新行或值管道。请注意,如果分隔符是毒药字符,则必须引用整个选项

@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

<强> - OUTPUT2 -

-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4

如果值可能包含换行符,或者您不知道任何值中没有出现的字符,则只需要更多代码。此解决方案将换行符,分隔符和斜杠编码为\N\D\S。循环内需要延迟扩展来解码值,因此!^必须转义为^!^^

<强> test3.csv

"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows

test3.bat - 几乎不允许使用任何有效的CSV格式。

@echo off
setlocal enableDelayedExpansion

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load columns with encoded values. The trailing ! is important ==%
  set "A=%%~A"!
  set "B=%%~B"!
  set "C=%%~C"!
  set "D=%%~D"!
  %== Decode values ==%
  for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
    set "%%V=!%%V:\N=%%~L!"
    set "%%V=!%%V:\D=,!"
    set "%%V=!%%V:\S=\!"
  )
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

<强> - OUTPUT3 -

---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=
B=
C=
D=value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows


test4.bat - 几乎允许任何有效的CSV,但现在使用%csvGetCol%宏。

@echo off

:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol

:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load and decode column values ==%
  %csvGetCol% A "," %%A
  %csvGetCol% B "," %%B
  %csvGetCol% C "," %%C
  %csvGetCol% D "," %%D
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

输出与test3.bat

相同


如果CSV文件非常大,那么将parseCSV.bat的输出保存到临时文件,然后使用FOR / F循环读取临时文件会更有效。

对于所有FOR / F用法,仍有一些固有的限制:

1)单个FOR / F无法解析超过32列。

2)8191个字符的批次行长度限制仍然是个问题。

答案 1 :(得分:1)

没有样本数据,所以解决方案不完整。

@ECHO OFF
SETLOCAL enabledelayedexpansion
(
 FOR /f "delims=" %%a IN (q27830845.txt) DO (
  SET "line=%%a"
  SET "line=!line:,,,= , , ,!"
  SET "line=!line:,,= , ,!"
  FOR /f "tokens=1-4delims=," %%b IN ("!LINE!") DO (
   ECHO(%%a--^>^>%%b++%%c++%%d++%%e++
  )
 )
)>newfile.txt

GOTO:EOF

我使用了一个名为q27830845.txt的文件,其中包含了我的测试数据。

col1,col 2,col 3,col4
one,two,three,four
ONE,,THREE,FOUR - no two
ONE,,,FOUR - 3 and 2 missing
,,,Only FOUR

生成包含内容

的newfile.txt
col1,col 2,col 3,col4-->>col1++col 2++col 3++col4++
one,two,three,four-->>one++two++three++four++
ONE,,THREE,FOUR - no two-->>ONE ++ ++THREE++FOUR - no two++
ONE,,,FOUR - 3 and 2 missing-->>ONE ++ ++ ++FOUR - 3 and 2 missing++
,,,Only FOUR-->> ++ ++ ++Only FOUR++

请注意,%%a等可能会附加 Space 。毫无疑问,这会对cmd !% ++有意义的字符表现出敏感性。 {{1}}仅用作字段之间明显的可视分隔符。