我目前正在尝试批量解析csv文件,但由于开头“ ------,----”内的逗号过多而无法解析。另外,某些cvs文件不包含此字段,因此我无法仅移动令牌。这是csv文件的示例:
Datasheets,Image,Digi-Key Part Number,Manufacturer Part Number,Manufacturer,Description,Quantity Available,Factory Stock,Unit Price (USD),@ qty,Minimum Quantity,"Packaging","Series","Part Status","Capacitance","Tolerance","Voltage - Rated","Dielectric Material","Number of Capacitors","Circuit Type","Temperature Coefficient","Ratings","Mounting Type","Package / Case","Size / Dimension","Height - Seated (Max)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10582TR-ND,ECJ-RVC1H150K,Panasonic Electronic Components,CAP ARRAY 15PF 50V NP0 1206,0,0,"Obsolete","0","4000","Tape & Reel (TR)","ECJ-R","Obsolete","15pF","±10%","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10582CT-ND,ECJ-RVC1H150K,Panasonic Electronic Components,CAP ARRAY 15PF 50V NP0 1206,1801,0,"0.45000","0","1","Cut Tape (CT)","ECJ-R","Obsolete","15pF","±10%","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10582DKR-ND,ECJ-RVC1H150K,Panasonic Electronic Components,CAP ARRAY 15PF 50V NP0 1206,1801,0,"Digi-Reel","0","1","Digi-Reel®","ECJ-R","Obsolete","15pF","±10%","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10580TR-ND,ECJ-RVC1H100F,Panasonic Electronic Components,CAP ARRAY 10PF 50V NP0 1206,0,0,"Obsolete","0","4000","Tape & Reel (TR)","ECJ-R","Obsolete","10pF","±1pF","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10580CT-ND,ECJ-RVC1H100F,Panasonic Electronic Components,CAP ARRAY 10PF 50V NP0 1206,0,0,"Obsolete","0","1","Cut Tape (CT)","ECJ-R","Obsolete","10pF","±1pF","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
这是我的代码示例:
FOR /F "skip=1 tokens=3-6 delims=, " %%A IN (File.csv) DO (
ECHO %%A,%%B,%%D,%%C
)
答案 0 :(得分:2)
这个问题很有趣。几周前,我用值中的逗号解决了very similar problem where a FOR /F needed to parse a CSV的问题。 My answer包括一个纯批处理解决方案。在该答案中,我还解释了许多使纯批处理CSV解析变得困难的问题。
我已经将该技术重构为以下可重用的:processLine
和:decodeToken
例程。这些例程要求在主处理循环之前启用延迟扩展。该技术旨在将每个FOR / F令牌值放入一个类似命名的环境变量中。除去引号,并将值内的""
(如果存在)加倍,减少为"
。
顶部的外部循环调用例程,将所有"
加倍,对字段重新排序,并将每个字段括在引号内。可以轻松地重新构造外循环以执行所需的任何操作。底部的:processLine
和:parseToken
例程无需更改。
下面的代码比aschipfl answer快5倍。输出是相同的,除了我的代码将每个字段都括在引号中,即使那些不需要的地方也是如此。 CSV完全可以接受。
@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test.csv") do (
call :processLine A ln
for /f "tokens=3-6 delims=," %%A in ("!ln!") do (
for %%v in (A B C D) do call :decodeToken %%v
echo "!A:"=""!","!B:"=""!","!D:"=""!","!C:"=""!"
)
)
exit /b
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: The following routines will work for any CSV as long as no field contains \n
:: and no line approaches the 8191 character limit.
:processLine forVarCharIn envVarOut
::
:: Prepares CSV line stored in FOR variable %%forVarIn to be safely parsed by
:: FOR /F with delayed expansion enabled. The result is stored in environment
:: variable envVarOut.
::
:: All "" become "
:: All @ become @a
:: All quoted , become @c
:: All ^ become ^^
:: All ! become ^!
:: All fields are enclosed within quotes
::
setlocal
setlocal disableDelayedExpansion
for %%. in (.) do set "ln=%%%1"
set "ln=,%ln:"=""%,"
set "ln=%ln:^=^^^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:!=^^!%"
set "ln=%ln:,=^,^,%"
set ^"ln=%ln:""="%^"
set "ln=%ln:"=""%"
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"
endlocal & set "ln=%ln:""="%" !
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"
set "ln=!ln:~2,-2!"
set "ln=!ln:^=^^^^!"
endlocal&set "%2=%ln:!=^^^!%"
set "%2=!%2:""="!"
set "%2="!%2:,,=","!"" !
exit /b
:decodeToken V
::
:: Decodes field in %%V and stores in environment variable V
:: All @c become ,
:: All @a become @
::
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
set "%1=!%1:@c=,!"
set "%1=!%1:@a=@!"
)
exit /b
如果您确定所有值都不包含"
文字,那么可以将顶部的循环简化为:
@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test.csv") do (
call :processLine A ln
for /f "tokens=3-6 delims=," %%A in ("!ln!") do (
for %%v in (A B C D) do call :decodeToken %%v
echo "!A!","!B!","!D!","!C!"
)
)
exit /b
更好的是,由于您要保留的列都不包含@
或,
或"
,因此可以大大简化top循环,而无需使用{{1} },将性能提高2倍(比aschipfl的答案快10倍):
:parseToken
这些例程将与任何CSV一起使用,只要所有CSV值都不包含换行符,并且所有已处理的行均不超过批处理施加的8191个字符限制。
此外,所有简单的FOR / F技术都限于最多解析32个令牌。在DosTips上,我演示了how to parse and process hundreds of CSV fields。它需要一些复杂的批处理编码,但是这些例程又可重复使用,因此外循环易于管理。
答案 1 :(得分:1)
这里是一种纯batch-file方法,它允许提取和重新排列CSV文件的指定列。列索引及其顺序需要在脚本顶部的常量_LIST
中进行定义:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (input CSV file; `%~1` is first argument)
set "_LIST=3 4 6 5" & rem // (list of one-based column indexes to return)
rem // Define temporary replacements into pseudo-array `$REPL[]`:
call :SUBSTARR $REPL
rem // Read input CSV file line by line:
for /F "delims=" %%L in ('findstr /N "^" "%_FILE%"') do (
set "LINE=%%L"
set /A "INUM=0, LNUM=LINE"
setlocal EnableDelayedExpansion
set "LINE=!LINE:*:=!"
rem // Temporarily substitute standard token delimiters but `,`:
if defined LINE set "LINE=!LINE:\=\b!"
call :REPLCHAR LINE LINE "^!" "\m"
for /F "tokens=2* delims=[=]" %%M in ('set $REPL') do (
if "%%N" == "" (
call :REPLCHAR LINE LINE "=" "%%M"
) else if "%%N" == "*" (
call :REPLCHAR LINE LINE "*" "%%M"
) else (
if defined LINE set "LINE=!LINE:%%N=%%M!"
)
)
rem // Split line (row) into comma-separated items (fields, cells):
for %%I in ('!LINE:^,^='^,'!') do (
endlocal
set /A "INUM+=1"
set "ITEM=%%I"
setlocal EnableDelayedExpansion
set "ITEM=!ITEM:','=,!"
for /F "delims=" %%J in ("$ITEM[!INUM!]=!ITEM:~1,-1!") do (
endlocal & set "%%J"
setlocal EnableDelayedExpansion
)
)
rem // Rebuild line (row) as per specified list of column indexes:
set "LINE=," & for %%I in (%_LIST%) do (
if %%I gtr 0 if %%I leq !INUM! (
set "LINE=!LINE!!$ITEM[%%I]!,"
) else set "LINE=!LINE!,"
)
rem // Revert substitution of standard token delimiters but `,`:
for /F "tokens=2* delims=[=]" %%M in ('set $REPL') do (
if "%%N" == "" (
set "LINE=!LINE:%%M==!"
) else (
set "LINE=!LINE:%%M=%%N!"
)
)
call :REPLCHAR LINE LINE "\m" "^!"
set "LINE=!LINE:\b=\!"
rem // Return modified line (row):
>&2 < nul set /P ="!LNUM!:"
echo(!LINE:~1^,-1!
endlocal
)
endlocal
exit /B
:NONPRINT
rem // Obtain several non-printable characters:
for /F "tokens=1-8 delims=#" %%S in ('
forfiles /P "%~dp0." /M "%~nx0" /C ^
"cmd /C echo/0x08#0x09#0x0B#0x0C#0x1A#0x1B#0x7F#0xFF"
') do (
rem // Get back-space, horizontal & vertical tabulators and form-feed:
set "_BS=%%S" & set "_HT=%%T" & set "_VT=%%U" & set "_FF=%%V"
rem // Get substitute (end-of-file), escape, delete and fixed space:
set "_SS=%%W" & set "_ES=%%X" & set "_DE=%%Y" & set "_XX=%%Z"
)
exit /B
:SUBSTARR <rtn_array>
rem // Obtain non-printable token delimiters:
call :NONPRINT
rem // Define substitutions by a pseudo-array:
for %%R in (
"[\i]=;"
"[\e]=="
"[\s]= "
"[\t]=%_HT%"
"[\v]=%_VT%"
"[\f]=%_FF%"
"[\x]=%_XX%"
) do set "%~1%%~R"
rem // Define wildcards as substitutions too:
set "%~1[\a]=*"
set "%~1[\q]=?"
set "%~1[\l]=<"
set "%~1[\g]=>"
rem set "%~1[\m]=!"
rem set "%~1[\b]=\"
rem set "%~1[\c]=,"
exit /B
:LENGTH <rtn_length> <ref_string>
rem // Determine length of a string:
setlocal EnableDelayedExpansion
set "STR=!%~2!"
if not defined STR (set /A "LEN=0") else (set /A "LEN=1")
for %%L in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
if defined STR (
set "INT=!STR:~%%L!"
if not "!INT!" == "" set /A "LEN+=%%L" & set "STR=!INT!"
)
)
endlocal & set "%~1=%LEN%"
exit /B
:REPLCHAR <rtn_string> <ref_string> <val_char> <val_replace>
rem // Replace given character in a string by another string:
setlocal
set "DXF=!"
setlocal DisableDelayedExpansion
set "CHR=%~3"
set "RPL=%~4"
setlocal EnableDelayedExpansion
set "STR=!%~2!"
if defined CHR (
call :LENGTH LEN STR
call :LENGTH LCH CHR
set /A "LEN-=1" & for /L %%P in (!LEN!,-1,0) do (
for %%O in (!LCH!) do (
if "!STR:~%%P,%%O!" == "!CHR!" (
set /A "INC=%%P+%%O" & for %%Q in (!INC!) do (
set "STR=!STR:~,%%P!!RPL!!STR:~%%Q!"
)
)
)
)
)
if not defined DXF if defined STR set "STR=!STR:"=""!"
if not defined DXF if defined STR set "STR=!STR:^=^^^^!"
if not defined DXF if defined STR set "STR=%STR:!=^^^!%" !
if not defined DXF if defined STR set "STR=!STR:""="!"
for /F "delims=" %%E in (^""!STR!"^") do (
endlocal & endlocal & endlocal & set "%~1=%%~E" !
)
exit /B
复杂的事情是正确处理无引号和带引号的分隔符(,
);解释了此脚本的大小。
鉴于脚本名为reconstruct-csv.bat
,输入的CSV文件名为File.csv
,请使用以下命令行运行它:
reconstruct-csv.bat "File.csv"
要将输出写到另一个CSV文件中,例如说File_NEW.csv
,而不是显示它,请使用以下方法:
reconstruct-csv.bat "File.csv" > "File_NEW.csv"