使用批处理脚本,如何使用正则表达式在.csv文件中拆分数据?

时间:2019-05-23 10:40:18

标签: regex csv batch-file

我有一个.csv文件(通过导出googleDoc电子表格生成),我需要从中提取信息。该信息不包含一致的定界符。

我当前正在使用逗号(,)作为定界符,当从前4列中获取信息时,它可以很好地工作。

但是,当我想从第8列中提取信息时,会得到错误的数据。这是因为某些单元格包含2条信息,这些信息以逗号分隔。

包含2条信息的单元格的开头和结尾都带有双引号(“)。提供像1,"2,3",4这样的数据

我的分割器无法识别1,2,3,4和1,“ 2,3”,4之间的差异,因此第三个值对于第一组返回3,对于第二组返回3"集,则应在第二集返回4时(期望第一集返回3)

下面是我正在使用的.csv文件的摘录。

A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",BLANK,Blank,,BLANK,
A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
,,,,,,7,Butter,Gold,GOLD,yellow
B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,

这是我当前用于提取信息的for循环,外部forloop检查空数据以确保始终返回相同的列。内部的forloop将数据值放入数组中。

SET originalCol=8
SET newCol=10
SET startRow=2
SET lastRow=45
SET rowsToSkip=1
SET /a i=0
SET /a totalValues=0
SET /a maxLines=%lastRow%-%startRow%
FOR /f "skip=%rowsToSkip% delims=" %%L in (%fileLocation%) DO (
    set "line=%%L,,,,,,,,"
    set "line=#!line:,=,#!"
    FOR /f "tokens=1,%originalCol%,%newCol% delims=," %%F IN ("!line!") DO (
        set "param1=%%F"
        set "param2=%%G"
        set "param3=%%H"
        set "param1=!param1:~1!"
        set "param2=!param2:~1!"
        set "param3=!param3:~1!"
        IF NOT #!param1!# == ## (
            SET /a lineCounter=!i!+%startRow%
            SET /a totalValues=!i!
            SET originalValuesList[!i!]=!param2!
            SET newValuesList[!i!]=!param3!
            IF !i! == %maxLines% (
                goto :copyingCSVDataComplete
            ) ELSE (
                SET /a i+=1
            )
        )
    )
)
echo.  originalValuesList [A] & echo [%originalValuesList[0]%, %originalValuesList[1]%, %originalValuesList[2]%, %originalValuesList[3]%, %originalValuesList[4]%, %originalValuesList[5]%, %originalValuesList[6]%, %originalValuesList[7]%]
echo.
echo.  originalValuesList [B] & echo [%originalValuesList[8]%]
echo.
echo.  newValuesList [A] & echo [%newValuesList[0]%, %newValuesList[1]%, %newValuesList[2]%, %newValuesList[3]%, %newValuesList[4]%, %newValuesList[5]%, %newValuesList[6]%, %newValuesList[7]%]
echo.
echo.  newValuesList [B] & echo [%newValuesList[8]%]

急性:

  originalValuesList [A]
[GOLD", GOLD2", "GOLD, "ZIRCON,  CHOCOLATE_CHIPS,  BUTTERCREAM",  BAKING_POWDER", ALEXANDRITE"]

  originalValuesList [B]
[ BUTTER"]



  newValuesList [A]
[Blank, FreeSpin, PERIDOT", TOPAZ", "EMERALD, BRONZE, BRONZE, Flour]

  newValuesList [B]
[EMERALD]

预期:

  originalValuesList [A]
[Blank, FreeSpin, Champagne, Cocoa, GoldLeaf, Sugar, IcingSugar, flour]

  originalValuesList [B]
[ChocolateChips]



  newValuesList [A]
[BLANK, FREE_SPIN, GARNET, LAPIS_LAZULI, GOLD3, EMERALD, JADE, BRONZE]

  newValuesList [B]
[ALEXANDRITE]

因此,我想要使用相同的代码,但我不想基于逗号(,)分隔符进行拆分,而是希望基于正则表达式进行拆分。像(,“(([A-Z] *),”))之类的| (,)

是否可以在批处理中使用正则表达式,如果可以,如何使用它来分割字符串?

1 个答案:

答案 0 :(得分:0)

首先,PowerShell具有分析和处理CSV文档的内置功能,因此这是一个更好的选择。但是我会坚持使用批处理。

正则表达式解决方案

正则表达式不适用于纯本地批处理解决方案,原因有二:

  • 不可能更改FOR / F行为以通过正则表达式解析令牌-这是它-非常有限。
  • 要使用FOR / F解析文件,您需要在解析之前对每一行进行操作。批处理没有任何可以更改内容的正则表达式实用程序。它仅具有FINDSTR,可以执行非常粗糙的正则表达式搜索,但始终返回原始匹配行。最重要的是,FINDSTR正则表达式是如此残缺,我不确定您是否仍然可以正确解析CSV。

您可以通过CSCRIPT使用自定义JScript或VBScript通过正则表达式搜索对文件进行预处理,并以FOR / F可以解析文件的方式进行替换。我已经写了一个hybrid JScript/batch regular expression processing utility called JREPL.BAT对此很有效。

带引号的CSV字段可以包含引号文字,在这种情况下,引号自由变量将加倍。以下正则表达式将与任何CSV令牌(不包括逗号分隔符)("(?:""|[^"])*"|[^,"]*)相匹配。它会查找引号,后跟任意数量的非引号字符和/或双引号,然后是右引号,不包括引号或逗号的任意数量的字符。但是您的CSV不包含任何双引号文字,因此可以将正则表达式简化为("[^"]*"|[^,"]*)

CSCRIPT没有在引数内传递引号文字的机制,因此JREPL具有/ XSEQ选项以启用扩展的转义序列支持,包括\q代表"。另一种选择是使用标准\x22序列。 JREPL "(\q[^\q]*\q|[^,\q]*)," "$1;" /XSEQ /F "test.csv"将匹配任何令牌(可能为空),后跟一个逗号定界符,并保留令牌并用分号替换逗号。

但这仍然留下空令牌,并且FOR / F无法正确解析空令牌。因此,我可以在替换项中添加一些JSCRIPT来删除所有现有的引号,然后将每个标记都用引号引起来(最后一个除外,在不需要时除外)
JREPL "(\q[^\q]*\q|[^,\q]*)," "$txt='\q'+$1.replace(/'\q'/,'')+'\q;'" /JQ /XSEQ /F "test.csv"

这里是一个演示,展示了如何使用它来解析CSV:

@echo off
for /f "tokens=1-11 delims=;" %%A in (
  'JREPL "(\q[^\q]*\q|[^,\q]*)," "$txt='\x22'+$1.replace(/\x22/g,'')+'\x22;'" /JQ /XSEQ /F test.csv'
) do (
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo E=%%~E
  echo F=%%~F
  echo G=%%~G
  echo H=%%~H
  echo I=%%~I
  echo J=%%~J
  echo K=%%~K
  echo(
)

-输出-

A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H="This
I="BLANK""
J=
K=BLANK

A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=

A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=

A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue

A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow

A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green

A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green

A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow

A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow

A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=

但是我不会为此使用regluar表达式。还有其他方法。

纯本地批处理解决方案

信不信由你,使用内部批处理命令操纵每一行,以便FOR / F可以解析所有标记,并不困难。

您的CSV需要发生两件事:

1)必须将未加引号的逗号定界符转换为文件中未出现的其他字符,仅保留带引号的逗号。我可以使用technique that jeb developed的派生词来区分带引号的字符和不带引号的字符:当变量以百分比扩展方式进行扩展时,像^,这样的转义字符将根据其是否被引用而被区别对待。通常,^,变为^,而"^,"保持不变。但是,如果您使用CALL,则"^,"变为"^^,",而^,保持不变。无论哪种方式,都可以区分带引号和不带引号的字符。

2)FOR / F无法解析空令牌,因此空令牌必须用引号引起来。简单地将所有标记括在引号中是最简单的。

@echo off
setlocal enableDelayedExpansion
for /f "usebackq delims=" %%A in ("test.csv") do (

  %= Print out the raw line so we can verify the end result =%
  echo %%A

  %= Preprocess the line so it is safe to parse =%
  set "ln=%%A"           %= Transfer line to environment variable =%

  %= Artifact of CALL - Convert quoted , to ^^; and unquoted , to ^;        =%
  %= Make sure unquoted SET statement does not have any trailing characters =%
  call set ln=%%ln:,=^^;%%

  set "ln=!ln:^^;=,!"    %= Convert quoted ^^; back into ,                         =%
  set "ln=!ln:^;=;!"     %= Convert unquoted ^; to ;                               =%
  set "ln=!ln:"=!"       %= Strip all quotes so we can safely do next step         =%
  set "ln="!ln:;=";"!""  %= Enclose all tokens in quotes to protect empty tokens   =%

  %= The line is now ready to parse with another FOR /F     =%
  %= I simply print the value of all 11 tokens, 1 per line. =%
  %= Adjust the loop as needed to suit your needs.          =%
  for /f "tokens=1-11 delims=;" %%A in ("!ln!") do (
    for %%a in (A B C D E F G H I J K) do call :echoToken %%a
    echo(
  )
)
exit /b

:echoToken  Char
for %%. in (.) do echo %1=%%~%1
exit /b

这是没有所有注释的相同代码:

@echo off
setlocal enableDelayedExpansion
for /f "usebackq delims=" %%A in ("test.csv") do (
  echo %%A
  set "ln=%%A"
  call set ln=%%ln:,=^^;%%
  set "ln=!ln:^^;=,!"
  set "ln=!ln:^;=;!"
  set "ln=!ln:"=!"
  set "ln="!ln:;=";"!""
  for /f "tokens=1-11 delims=;" %%A in ("!ln!") do (
    for %%a in (A B C D E F G H I J K) do call :echoToken %%a
    echo(
  )
)
exit /b

:echoToken  Char
for %%. in (.) do echo %1=%%~%1
exit /b

-输出---

A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H=This,BLANK
I=
J=BLANK
K=

A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=

A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=

A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue

A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow

A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green

A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green

A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow

,,,,,,7,Butter,Gold,GOLD,yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow

B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=

但是在很多情况下,使CSV解析变得更加复杂。

这是一个健壮的纯批处理解决方案,只要字段中没有换行符,并且行长度都不超过8191字节批处理限制,并且您不需要解析超过31个令牌,就可以解析任何CSV。对该代码进行了大量注释,以解释所需的所有步骤。

@echo off
setlocal enableDelayedExpansion

:: Must use arcane FOR /F option syntax to disable both EOL and DELIMS.
for /f usebackq^ delims^=^ eol^= %%A in ("test2.csv") do call :processLine
:: I CALL out of the loop to a :subroutine because a single CALL :subroutine
:: is much faster than many CALL SET statements. It also simplifies the
:: management of delayed expansion.

exit /b


:processLine

:: Must disable delayed expansion so percent expansion does not corrupt ! or ^ literals.
setlocal disableDelayedExpansion

:: FOR variables are global - this extra FOR loop exposes %%A that would otherwise be hidden.
for %%. in (.) do set "ln=%%A"

:: Print out raw line so we can diagnose the result.
set ln

:: "Hide" quotes by doubling, making all characters safe for percent expansion when
:: entire string is quoted. Also enclose line within extra set of , delimiters.
set "ln=,%ln:"=""%,"

:: Escape poison characters so all characters are safe for unquoted percent expansion.
set "ln=%ln:^=^^^^%" %= Double escaped to account for enabled delayed expansion later on. =%
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"

:: Double escape ! so not corrupted by later percent expansion while delayed expansion enabled.
set "ln=%ln:!=^^!%"

:: Double and escape all commas.   , -> ^,^,
set "ln=%ln:,=^,^,%"

:: Undouble quotes and unescape (originally) unquoted strings. Note that outer quotes are escaped.
set ^"ln=%ln:""="%^"

:: At this point quoted comma literals are still ^,^, whereas unquoted comma delimiters are ,,
:: Also, all quoted poison characters are still escaped, but unquoted ones are not.

:: Redouble quotes, all characters safe again for quoted percent expansion.
set "ln=%ln:"=""%"

:: Encode @ as @a and quoted comma literals ^,^, as @c
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"

:: Restore delayed expansion and undouble quotes, which unescapes (originally) quoted strings.
:: Note that outer quotes are NOT escaped this time. The ENDLOCAL and SET are on the same
:: line so that the percent expansion value is transferred across the ENDLOCAL barrier.
endlocal & set "ln=%ln:""="%" !   %= Trailing ! is ignored except forces all ^^ to become ^ =%

:: At this point no characters are escaped, and all ! and ^ are unprotected against percent or
:: FOR variable expansion while delayed expansion is enabled.

:: Remove enclosing quotes from tokens that are already quoted so we can later safely enclose
:: all tokens in quotes. This is why the extra enclosing , were added at the beginning.
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"

:: Remove outer , delimiters that were added at the beginning.
set "ln=!ln:~2,-2!"

:: Must double escape ! and ^ again to protect against delayed expansion within parsing FOR /F loop.
set "ln=!ln:^=^^^^!"
set "ln=%ln:!=^^^!%"

:: Undouble remaining quotes because quote literals are doubled within original CSV.
set "ln=!ln:""="!"

:: Restore doubled ,, delimiters to , and enclose all tokens within quotes to preserves empty tokens.
set "ln="!ln:,,=","!"" !

:: The line is now safe to parse with FOR /F, though @ and , are encoded as @a and @c

:: Parse line into tokens.
for /f "tokens=1-11 delims=," %%A in ("!ln!") do (

  %= Decode the tokens and store result in environment variables =%
  for %%a in (A B C D E F G H I J K) do call :decodeToken %%a

  %= Your processing goes here. Decoded %%A - %%K are now safely in !A! - !K! =%
  %= I will simply echo all the values, one per line =%
  for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
  echo(
)
exit /b


:decodeToken  Char
:: Converts @c and @a back into , and @
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:@c=,!"
  set "%1=!%1:@a=@!"
)
exit /b

这是没有所有注释的相同代码:

@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test2.csv") do call :processLine
exit /b

:processLine
setlocal disableDelayedExpansion
for %%. in (.) do set "ln=%%A"
set ln
set "ln=,%ln:"=""%,"
set "ln=%ln:^=^^^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:!=^^!%"
set "ln=%ln:,=^,^,%"
set ^"ln=%ln:""="%^"
set "ln=%ln:"=""%"
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"
endlocal & set "ln=%ln:""="%" !
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"
set "ln=!ln:~2,-2!"
set "ln=!ln:^=^^^^!"
set "ln=%ln:!=^^^!%"
set "ln=!ln:""="!"
set "ln="!ln:,,=","!"" !
for /f "tokens=1-11 delims=," %%A in ("!ln!") do (
  for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
  for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
  echo(
)
exit /b

:decodeToken  Char
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:@c=,!"
  set "%1=!%1:@a=@!"
)
exit /b

这是您的示例CSV文件,其中添加了一行以测试各种复杂性:

;A!,"B!","C is ""cool""",D @^&|<>,"E @^&|<>","F ,x","G ""@^&|<>""","H ""@^&|<>!""",I,J,K
A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
,,,,,,7,Butter,Gold,GOLD,yellow
B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,

这是最终输出:

ln=;A!,"B!","C is ""cool""",D @^&|<>,"E @^&|<>","F ,x","G ""@^&|<>""","H ""@^&|<>!""",I,J,K
A=;A!
B=B!
C=C is "cool"
D=D @^&|<>
E=E @^&|<>
F=F ,x
G=G "@^&|<>"
H=H "@^&|<>!"
I=I
J=J
K=K

ln=A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H=This,"BLANK"
I=
J=BLANK
K=

ln=A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=

ln=A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=

ln=A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue

ln=A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow

ln=A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green

ln=A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green

ln=A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow

ln=,,,,,,7,Butter,Gold,GOLD,yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow

ln=B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=

有关如何扩展此技术以解析超过32个字段的演示,请参见This DosTips post

混合JScript /批处理parseCSV.bat实用程序

纯批处理需要大量难以动态创建的代码,并且相对较慢。我创建了parseCSV.bat-一个混合的JScript / batch实用程序,可以将几乎所有CSV格式快速格式化为FOR / F可以轻松解析的格式。它甚至支持字段内的换行符。

当然parseCSV无法解决8191行的长度限制,并且解析32个以上的令牌仍然需要其他代码。

parseCSV.bat不使用正则表达式。

我不会详细介绍它的工作方式。该实用程序内置了完整的文档,可通过从命令行输入parseCSV /?获得该文档。帮助的输出如下:

parseCSV  [/option]...

  Parse stdin as CSV and write it to stdout in a way that can be safely
  parsed by FOR /F. All columns will be enclosed by quotes so that empty
  columns may be preserved. It also supports delimiters, newlines, and
  escaped quotes within quoted values. Two consecutive quotes within a
  quoted value are converted into one quote by default.

  Available options:

    /I:string = Input delimiter. Default is a comma (,)

    /O:string = Output delimiter. Default is a comma (,)

         The entire option must be quoted if specifying poison character
         or whitespace literals as a delimiters for /I or /O.

         Examples:  pipe = "/I:|"
                   space = "/I: "

         Standard JScript escape sequences can also be used.

         Examples:       tab = /I:\t  or  /I:\x09
                   backslash = /I:\\

    /E = Encode output delimiter literal within value as \D
         Encode newline within value as \N
         Encode backslash within value as \S

    /D = escape exclamation point and caret for Delayed expansion
         ! becomes ^!
         ^ becomes ^^

    /L = treat all input quotes as quote Literals

    /Q:QuoteOutputFormat

       Controls output of Quotes, where QuoteOutputFormat may be any
       one of the following:

         L = all columns quoted, quote Literals output as "   (Default)
         E = all columns quoted, quote literals Escaped as ""
         N = No columns quoted, quote literals output as "

       The /Q:E and /Q:N options are useful for transforming data for
       purposes other than parsing by FOR /F

    /U = Write unix style lines with newline (\n) instead of the default
         Windows style of carriage return and linefeed (\r\n).

parseCSV  /?

  Display this help

parseCSV  /V

  Display the version of parseCSV.bat

parseCSV.bat was written by Dave Benham. Updates are available at the original
posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702

这是将parseCSV.bat与上面的test2.csv一起使用的方式。

@echo off
setlocal enableDelayedExpansion
for /f "tokens=1-11 delims=," %%A in (
  'parseCSV /E /D ^<test2.csv'
) do (
  %= Decode Tokens =%
  for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
  %= Show the results =%
  for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
  echo(
)
exit /b

:decodeToken
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:\D=,!"
  set "%1=!%1:\S=\!"
)
exit /b

请参见This DosTips post,以获取有关如何扩展此技术以解析超过32个字段的演示。