我有一个逗号分隔的文本文件,其中包含三个字段。第一个总是包含一个字符串,但第二个,第三个或两个都可以为空。当所有包含字符串时,只有第三个是emppty,当第二个和第三个都是空的时,我在使用FOR命令读取时获得预期结果,预期结果是从包含字符串的字段读取的变量相等对于这些字符串,从空字段读取的变量具有空值。但是,当第二个fielkd是空的,并且第三个字段包含一个字符串时,我得到意外的结果,第二个变量,应该从第二个字段读取的那个等于第三个字段的内容,第三个字段的内容变量具有空值。
我该如何解决这个问题?
答案 0 :(得分:2)
此信息会从我的DosTips帖子中逐字复制:Safely parse nearly any CSV with parseCSV.bat
有人想要使用FOR / F解析CSV是很常见的。如果您知道所有列都已填充,并且值中没有逗号,换行符或引号,则这是一项简单的任务。假设有4列:
@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
echo ----------------------
echo A=%%~A
echo B=%%~B
echo C=%%~C
echo D=%%~D
echo(
)
但如果出现以下任何一种情况,事情会变得更加困难:
1)值可能为空,带有连续逗号。 FOR / F将连续的分隔符视为一个,因此它会抛弃列赋值。
2)引用的值可能包含逗号。 FOR / F会错误地将带引号的逗号视为列分隔符。
3)引用的值可能包含换行符。 FOR / F将在换行符处断行并错误地将一行视为两行。
4)引用的值可能包含代表一个引用的成对引号。
例如,"He said, ""Hello there""
。需要一种方法将""
转换为"
。
如果启用延迟扩展,则会出现可能出现的次要问题。
5)FOR变量%% A如果包含!
(或有时^
),如果扩展变量时启用延迟扩展,则会损坏。
对于其中一些问题,有一些相当简单的解决方案,但是使用纯批次解决所有这些问题非常困难(而且速度很慢)。
我编写了一个名为parseCSV.bat的混合JScript /批处理实用程序,可以使用FOR / F正确解析几乎任何CSV文件。
<强> parseCSV.bat 强>
@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment
::************ Documentation ***********
::parseCSV.bat version 1.0
:::
:::parseCSV [/option]...
:::
::: Parse stdin as CSV and write it to stdout in a way that can be safely
::: parsed by FOR /F. All columns will be enclosed by quotes so that empty
::: columns may be preserved. It also supports delimiters, newlines, and
::: quotes within quoted values. Two consecutive quotes within a quoted value
::: are converted into one quote.
:::
::: Available options:
:::
::: /I:string = Input delimiter. Default is a comma.
:::
::: /O:string = Output delimiter. Default is a comma.
:::
::: /E = Encode output delimiter in value as \D
::: Encode newline in value as \N
::: Encode backslash in value as \S
:::
::: /D = Escape exclamation point and caret for delayed expansion
::: ! becomes ^!
::: ^ becomes ^^
:::
:::parseCSV /?
:::
::: Display this help
:::
:::parseCSV /V
:::
::: Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::
::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
setlocal disableDelayedExpansion
for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
exit /b 0
)
if /i "%~1" equ "/V" (
for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0
************ JScript portion ***********/
var args = WScript.Arguments.Named,
stdin = WScript.Stdin,
stdout = WScript.Stdout,
escape = args.Exists("E"),
delayed = args.Exists("D"),
inDelim = args.Exists("I") ? args.Item("I") : ",",
outDelim = args.Exists("O") ? args.Item("O") : ",",
quote = false,
ln, c, n;
while (!stdin.AtEndOfStream) {
ln=stdin.ReadLine();
if (!quote) stdout.Write('"');
for (n=0; n<ln.length; n++ ) {
c=ln.charAt(n);
if (c == '"') {
if (quote && ln.charAt(n+1) == '"') {
n++;
} else {
quote=!quote;
continue;
}
}
if (c == inDelim && !quote) c='"'+outDelim+'"';
if (escape) {
if (c == outDelim) c="\\D";
if (c == "\\") c="\\S";
}
if (delayed) {
if (c == "!") c="^!";
if (c == "^") c="^^";
}
stdout.Write(c);
}
stdout.Write( (quote) ? ((escape) ? "\\N" : "\n") : '"\n' );
}
我还编写了一个定义宏的脚本,以帮助解析最有问题的CSV文件。有关带参数的批处理宏的背景信息,请参阅http://www.dostips.com/forum/viewtopic.php?f=3&t=1827。
<强> define_csvGetCol.bat 强>
::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
:: @echo off
:: setlocal disableDelayedExpansion
:: call define_csvGetCol
:: setlocal enableDelayedExpansion
:: for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
:: %== Load and decode column values ==%
:: %csvGetCol% A "," %%A
:: %csvGetCol% B "," %%B
:: %csvGetCol% C "," %%C
:: %== Display the result ==%
:: echo ----------------------
:: for %%V in (A B C) do echo %%V=!%%V!
:: echo(
:: )
::
:: Written by Dave Benham
::
:: Delayed expansion must be disabled during macro definition
:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^
^" The empty line above is critical - DO NOT REMOVE
:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"
:: Define csvGetCol
:: %csvGetCol% envVarName "Delimiter" FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
endlocal^&endlocal%\n%
set "%%1=%%~3"!%\n%
if defined %%1 (%\n%
for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
set "%%1=!%%1:\D=%%~2!"%\n%
set "%%1=!%%1:\S=\!"%\n%
)%\n%
)) else setlocal disableDelayedExpansion ^& set args=
如果您知道任何值中没有逗号或换行符,则使用非常简单,并且不需要延迟扩展:
<强> test1.csv 强>
"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4
test1.bat - 没有延迟展开,没有逗号或值的换行符
@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
echo -------------
echo(A=%%~A
echo(B=%%~B
echo(C=%%~C
echo(D=%%~D
echo(
)
<强> - OUTPUT1 - 强>
-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16
-------------
A=value1
B=
C=value3
D=value4
-------------
A=value1
B=
C=
D=value4
-------------
A=value1
B=
C=
D=
-------------
A=
B=
C=
D=value4
如果您知道任何值中不存在的字符,则逗号处于值中也很简单。只需为输出分隔符指定唯一字符。
<强> test2.csv 强>
"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4
test2.bat - 没有延迟扩展,没有新行或值管道。请注意,如果分隔符是毒药字符,则必须引用整个选项
@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
echo -------------
echo(A=%%~A
echo(B=%%~B
echo(C=%%~C
echo(D=%%~D
echo(
)
<强> - OUTPUT2 - 强>
-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16
-------------
A=value1
B=
C=value3
D=value4
-------------
A=value1
B=
C=
D=value4
-------------
A=value1
B=
C=
D=
-------------
A=
B=
C=
D=value4
如果值可能包含换行符,或者您不知道任何值中没有出现的字符,则只需要更多代码。此解决方案将换行符,分隔符和斜杠编码为\N
,\D
和\S
。循环内需要延迟扩展来解码值,因此!
和^
必须转义为^!
和^^
。
<强> test3.csv 强>
"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows
test3.bat - 几乎不允许使用任何有效的CSV格式。
@echo off
setlocal enableDelayedExpansion
:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^
^" The empty line above is critical - DO NOT REMOVE
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
%== Load columns with encoded values. The trailing ! is important ==%
set "A=%%~A"!
set "B=%%~B"!
set "C=%%~C"!
set "D=%%~D"!
%== Decode values ==%
for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
set "%%V=!%%V:\N=%%~L!"
set "%%V=!%%V:\D=,!"
set "%%V=!%%V:\S=\!"
)
%== Print results ==%
echo ---------------------
for %%V in (A B C D) do echo(%%V=!%%V!
echo(
)
<强> - OUTPUT3 - 强>
---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4
---------------------
A=value1
B=value2
C=value3
D=value4
---------------------
A=
B=
C=
D=value4
---------------------
A=value1
B=value2
C=value3
D=value4
---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows
test4.bat - 几乎允许任何有效的CSV,但现在使用%csvGetCol%
宏。
@echo off
:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol
:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
%== Load and decode column values ==%
%csvGetCol% A "," %%A
%csvGetCol% B "," %%B
%csvGetCol% C "," %%C
%csvGetCol% D "," %%D
%== Print results ==%
echo ---------------------
for %%V in (A B C D) do echo(%%V=!%%V!
echo(
)
输出与test3.bat
相同
如果CSV文件非常大,那么将parseCSV.bat的输出保存到临时文件,然后使用FOR / F循环读取临时文件会更有效。
对于所有FOR / F用法,仍有一些固有的限制:
1)单个FOR / F无法解析超过32列。
2)8191个字符的批次行长度限制仍然是个问题。
答案 1 :(得分:1)
没有样本数据,所以解决方案不完整。
@ECHO OFF
SETLOCAL enabledelayedexpansion
(
FOR /f "delims=" %%a IN (q27830845.txt) DO (
SET "line=%%a"
SET "line=!line:,,,= , , ,!"
SET "line=!line:,,= , ,!"
FOR /f "tokens=1-4delims=," %%b IN ("!LINE!") DO (
ECHO(%%a--^>^>%%b++%%c++%%d++%%e++
)
)
)>newfile.txt
GOTO:EOF
我使用了一个名为q27830845.txt
的文件,其中包含了我的测试数据。
col1,col 2,col 3,col4
one,two,three,four
ONE,,THREE,FOUR - no two
ONE,,,FOUR - 3 and 2 missing
,,,Only FOUR
生成包含内容
的newfile.txtcol1,col 2,col 3,col4-->>col1++col 2++col 3++col4++
one,two,three,four-->>one++two++three++four++
ONE,,THREE,FOUR - no two-->>ONE ++ ++THREE++FOUR - no two++
ONE,,,FOUR - 3 and 2 missing-->>ONE ++ ++ ++FOUR - 3 and 2 missing++
,,,Only FOUR-->> ++ ++ ++Only FOUR++
请注意,%%a
等可能会附加 Space 。毫无疑问,这会对cmd
!
和%
++
有意义的字符表现出敏感性。 {{1}}仅用作字段之间明显的可视分隔符。