我有很多包含10万多行的CSV文件,其结构与此类似:
Time,Longitude,Latitude,R,E,M
2016-01-01M12:01:01,39.92234,52.61532,"-11.5,-20.4",-4.5,No
2016-01-01M12:01:01,39.92238,52.61562,"-10.1,-12.7,-9.2,-7.7",,No
2016-01-01M12:01:02,39.92239,52.61552,"-12.1,-12.4",-3.9,No
2016-01-01M12:01:03,39.92248,52.61562,"-3.1,-1.9,-8.2",,No
依旧......
我想要的是获取引号之间的最大值数,相应地更改列名。
例如,第二行具有引号之间的最大值,因此R应更改为R1,R2,R3,R4
,最后使用批处理文件删除引号。
所以结果应该是这样的:
Time,Longitude,Latitude,R1,R2,R3,R4,E,M
2016-01-01M12:01:01,39.92234,52.61532,-11.5,-20.4,,,-4.5,No
2016-01-01M12:01:01,39.92238,52.61562,-10.1,-12.7,-9.2,-7.7,,No
2016-01-01M12:01:02,39.92239,52.61552,-12.1,-12.4,,,-3.9,No
2016-01-01M12:01:03,39.92248,52.61562,-3.1,-1.9,-8.2,,,No
依旧......
我一直试图找到任何一个例子如何做几乎几周,但没有成功。也许有人可以帮助我?
答案 0 :(得分:1)
虽然你没有表现出任何解决任务的努力,但我决定提供一个解决方案,因为它似乎是一个具有挑战性的项目。所以这就是我提出的:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (file to process; use first command line parameter)
rem // Initialise variables:
set /A "MAX=0" & rem // (maximum number of items in between quoted group)
set /A "POS=0" & rem // (position of quoted group)
rem // Pass 1: count maximum number of items within quotes:
set /A "COUNT=0, INDEX=0"
for /F usebackq^ skip^=1^ delims^=^ eol^= %%L in ("%_FILE%") do (
for %%I in (%%L) do (
set "QUOTED=%%I"
set "UNQUOTED=%%~I"
set /A "INDEX+=1"
setlocal EnableDelayedExpansion
if not "!QUOTED!"=="!UNQUOTED!" (
if !POS! leq 0 (
endlocal & set /A "POS=INDEX"
) else endlocal
set "COUNT="
setlocal EnableDelayedExpansion
set "ITEM=%%~I"
for %%J in ("!ITEM:,="^,"!") do (
if not defined COUNT endlocal
set /A "COUNT+=1"
)
setlocal EnableDelayedExpansion
if !MAX! lss !COUNT! (
endlocal & set /A "MAX=COUNT"
) else endlocal
) else endlocal
)
)
rem // Build separators butter:
set "SEPB=" & setlocal EnableDelayedExpansion
for /L %%E in (1,1,%MAX%) do (
set "SEPB=!SEPB!,"
)
endlocal & set "SEPB=%SEPB%"
rem // Process header:
set /A "INDEX=0"
for /F usebackq^ delims^=^ eol^= %%L in ("%_FILE%") do (
set "COLL=,"
for %%I in (%%L) do (
set /A "INDEX+=1" & set "ITEM=%%~I"
setlocal EnableDelayedExpansion
if !INDEX! equ !POS! (
for /L %%K in (1,1,%MAX%) do (
set "COLL=!COLL!!ITEM!%%K,"
)
) else (
set "COLL=!COLL!!ITEM!,"
)
for /F "delims=" %%E in (""!COLL!"") do (
endlocal & set "COLL=%%~E"
)
)
setlocal EnableDelayedExpansion
echo/!COLL:~1^,-1!
endlocal
goto :NEXT
)
:NEXT
rem // Pass 2: expand items in between quotes:
for /F usebackq^ skip^=1^ delims^=^ eol^= %%L in ("%_FILE%") do (
set "LINE=%%L" & set "COLL=,"
setlocal EnableDelayedExpansion
for %%I in ("!LINE:,="^,"!") do (
endlocal
set "SEPS=%SEPB%" & set "QUOTED=%%~I" & set "UNQUOTED="
for %%J in (%%~I) do (
set "UNQUOTED=%%~J"
setlocal EnableDelayedExpansion
if "!QUOTED!"=="!UNQUOTED!" (
set "COLL=!COLL!!QUOTED!," & set "SEPS="
) else (
set "COLL=!COLL!!UNQUOTED!," & set "SEPS=!SEPS:~,-1!"
)
for /F "delims=" %%E in (""!COLL!"") do (
for /F "delims=" %%F in (""!SEPS!"") do (
endlocal & set "COLL=%%~E" & set "SEPS=%%~F"
)
)
)
if not defined QUOTED set "SEPS=,"
setlocal EnableDelayedExpansion
for /F "delims=" %%K in (""!COLL!!SEPS!"") do (
endlocal & set "COLL=%%~K"
setlocal EnableDelayedExpansion
)
)
echo/!COLL:~1^,-1!
endlocal
)
endlocal
exit /B
假设批处理脚本在当前目录中保存为resolve-csv.bat
,并且要处理的CSV文件名为D:\Test\data.csv
,请在Windows命令提示符下键入以下内容:
resolve-csv.bat "D:\Test\data.csv"
要将输出存储到另一个CSV文件中,例如D:\Test\result.csv
,请输入:
resolve-csv.bat "D:\Test\data.csv" > "D:\Test\result.csv"
答案 1 :(得分:0)
此批处理文件执行您所描述的操作;评论足够解释......
@echo off
setlocal EnableDelayedExpansion
rem Process all *.csv files and
rem get the maximum number of values in the _first_ value between quotes
set "firstFile="
set "max=0"
for %%f in (*.csv) do (
if not defined firstFile set "firstFile=%%f"
for /F usebackq^ skip^=1^ tokens^=2^ delims^=^" %%a in ("%%f") do (
set "n=0"
for %%i in (%%a) do set /A n+=1
if !n! gtr !max! set "max=!n!"
)
)
rem Read header from first input file and generate header of output file
rem changing the name of the _fourth_ column
set /P "header=" < "%firstFile%"
for /F "tokens=1-4* delims=," %%a in ("%header%") do (
set "header=%%a,%%b,%%c"
for /L %%i in (1,1,%max%) do set "header=!header!,%%d%%i"
set "header=!header!,%%e"
)
rem Process all files and generate output file
rem removing quotes from the _first_ value between quotes
(
echo %header%
for %%f in (*.csv) do (
for /F usebackq^ skip^=1^ tokens^=1-2*^ delims^=^" %%a in ("%%f") do (
set "n=%max%"
for %%i in (%%~b) do set /A n-=1
set "second=%%~b"
for /L %%i in (!n!,-1,1) do set "second=!second!,"
echo %%a!second!%%c
)
) > output.tmp
当然,如果您的实际数据与示例数据的结构不同,则此程序将失败...