从CSV中删除qoute标记并使用批处理相应地更改列名?

时间:2018-03-19 10:58:55

标签: csv batch-file

我有很多包含10万多行的CSV文件,其结构与此类似:

Time,Longitude,Latitude,R,E,M
2016-01-01M12:01:01,39.92234,52.61532,"-11.5,-20.4",-4.5,No
2016-01-01M12:01:01,39.92238,52.61562,"-10.1,-12.7,-9.2,-7.7",,No
2016-01-01M12:01:02,39.92239,52.61552,"-12.1,-12.4",-3.9,No
2016-01-01M12:01:03,39.92248,52.61562,"-3.1,-1.9,-8.2",,No

依旧......

我想要的是获取引号之间的最大值数,相应地更改列名。

例如,第二行具有引号之间的最大值,因此R应更改为R1,R2,R3,R4,最后使用批处理文件删除引号。

所以结果应该是这样的:

Time,Longitude,Latitude,R1,R2,R3,R4,E,M
2016-01-01M12:01:01,39.92234,52.61532,-11.5,-20.4,,,-4.5,No
2016-01-01M12:01:01,39.92238,52.61562,-10.1,-12.7,-9.2,-7.7,,No
2016-01-01M12:01:02,39.92239,52.61552,-12.1,-12.4,,,-3.9,No
2016-01-01M12:01:03,39.92248,52.61562,-3.1,-1.9,-8.2,,,No

依旧......

我一直试图找到任何一个例子如何做几乎几周,但没有成功。也许有人可以帮助我?

2 个答案:

答案 0 :(得分:1)

虽然你没有表现出任何解决任务的努力,但我决定提供一个解决方案,因为它似乎是一个具有挑战性的项目。所以这就是我提出的:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1" & rem // (file to process; use first command line parameter)

rem // Initialise variables:
set /A "MAX=0" & rem // (maximum number of items in between quoted group)
set /A "POS=0" & rem // (position of quoted group)

rem // Pass 1: count maximum number of items within quotes:
set /A "COUNT=0, INDEX=0"
for /F usebackq^ skip^=1^ delims^=^ eol^= %%L in ("%_FILE%") do (
    for %%I in (%%L) do (
        set "QUOTED=%%I"
        set "UNQUOTED=%%~I"
        set /A "INDEX+=1"
        setlocal EnableDelayedExpansion
        if not "!QUOTED!"=="!UNQUOTED!" (
            if !POS! leq 0 (
                endlocal & set /A "POS=INDEX"
            ) else endlocal
            set "COUNT="
            setlocal EnableDelayedExpansion
            set "ITEM=%%~I"
            for %%J in ("!ITEM:,="^,"!") do (
                if not defined COUNT endlocal
                set /A "COUNT+=1"
            )
            setlocal EnableDelayedExpansion
            if !MAX! lss !COUNT! (
                endlocal & set /A "MAX=COUNT"
            ) else endlocal
        ) else endlocal
    )
)

rem // Build separators butter:
set "SEPB=" & setlocal EnableDelayedExpansion
for /L %%E in (1,1,%MAX%) do (
    set "SEPB=!SEPB!,"
)
endlocal & set "SEPB=%SEPB%"

rem // Process header:
set /A "INDEX=0"
for /F usebackq^ delims^=^ eol^= %%L in ("%_FILE%") do (
    set "COLL=,"
    for %%I in (%%L) do (
        set /A "INDEX+=1" & set "ITEM=%%~I"
        setlocal EnableDelayedExpansion
        if !INDEX! equ !POS! (
            for /L %%K in (1,1,%MAX%) do (
                set "COLL=!COLL!!ITEM!%%K,"
            )
        ) else (
            set "COLL=!COLL!!ITEM!,"
        )
        for /F "delims=" %%E in (""!COLL!"") do (
            endlocal & set "COLL=%%~E"
        )
    )
    setlocal EnableDelayedExpansion
    echo/!COLL:~1^,-1!
    endlocal
    goto :NEXT
)
:NEXT

rem // Pass 2: expand items in between quotes:
for /F usebackq^ skip^=1^ delims^=^ eol^= %%L in ("%_FILE%") do (
    set "LINE=%%L" & set "COLL=,"
    setlocal EnableDelayedExpansion
    for %%I in ("!LINE:,="^,"!") do (
        endlocal
        set "SEPS=%SEPB%" & set "QUOTED=%%~I" & set "UNQUOTED="
        for %%J in (%%~I) do (
            set "UNQUOTED=%%~J"
            setlocal EnableDelayedExpansion
            if "!QUOTED!"=="!UNQUOTED!" (
                set "COLL=!COLL!!QUOTED!," & set "SEPS="
            ) else (
                set "COLL=!COLL!!UNQUOTED!," & set "SEPS=!SEPS:~,-1!"
            )
            for /F "delims=" %%E in (""!COLL!"") do (
                for /F "delims=" %%F in (""!SEPS!"") do (
                    endlocal & set "COLL=%%~E" & set "SEPS=%%~F"
                )
            )
        )
        if not defined QUOTED set "SEPS=,"
        setlocal EnableDelayedExpansion
        for /F "delims=" %%K in (""!COLL!!SEPS!"") do (
            endlocal & set "COLL=%%~K"
        setlocal EnableDelayedExpansion
        )
    )
    echo/!COLL:~1^,-1!
    endlocal
)

endlocal
exit /B

假设批处理脚本在当前目录中保存为resolve-csv.bat,并且要处理的CSV文件名为D:\Test\data.csv,请在Windows命令提示符下键入以下内容:

resolve-csv.bat "D:\Test\data.csv"

要将输出存储到另一个CSV文件中,例如D:\Test\result.csv,请输入:

resolve-csv.bat "D:\Test\data.csv" > "D:\Test\result.csv"

答案 1 :(得分:0)

此批处理文件执行您所描述的操作;评论足够解释......

@echo off
setlocal EnableDelayedExpansion

rem Process all *.csv files and
rem get the maximum number of values in the _first_ value between quotes
set "firstFile="
set "max=0"
for %%f in (*.csv) do (
   if not defined firstFile set "firstFile=%%f"
   for /F usebackq^ skip^=1^ tokens^=2^ delims^=^" %%a in ("%%f") do (
      set "n=0"
      for %%i in (%%a) do set /A n+=1
      if !n! gtr !max! set "max=!n!"
   )
)

rem Read header from first input file and generate header of output file
rem changing the name of the _fourth_ column
set /P "header=" < "%firstFile%"
for /F "tokens=1-4* delims=," %%a in ("%header%") do (
   set "header=%%a,%%b,%%c"
   for /L %%i in (1,1,%max%) do set "header=!header!,%%d%%i"
   set "header=!header!,%%e"
)

rem Process all files and generate output file
rem removing quotes from the _first_ value between quotes
(
echo %header%
for %%f in (*.csv) do (
   for /F usebackq^ skip^=1^ tokens^=1-2*^ delims^=^" %%a in ("%%f") do (
      set "n=%max%"
      for %%i in (%%~b) do set /A n-=1
      set "second=%%~b"
      for /L %%i in (!n!,-1,1) do set "second=!second!,"
      echo %%a!second!%%c
   )
) > output.tmp

当然,如果您的实际数据与示例数据的结构不同,则此程序将失败...