将CSV文件拆分为多个文件,其中包含标题和给定的记录数

时间:2018-03-29 00:19:05

标签: csv batch-file split

我有一个巨大的CSV文件,我需要将其分成小的CSV文件,在每个文件中保留标题,并确保保留所有记录。例如,这是原始文件:

 ID    Date   
 1     01/01/2010
 1     02/01/2010
 2     01/01/2010 
 2     05/01/2010
 2     06/01/2010
 3     06/01/2010
 3     07/01/2010
 4     08/01/2010
 4     09/01/2010

如果我正确拆分文件,我应该看到data_1.csv中的前5条记录和data_2.csv中的最后4条记录。

代码我只按行拆分而不保留标题。我不知道如何修改它:

 @echo off
 setLocal EnableDelayedExpansion

 set limit=5
 set file=data.csv
 set lineCounter=1
 set filenameCounter=1


 set name=
 set extension=

 for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
 )

 for /f "tokens=*" %%a in (%file%) do (
set splitFile=!name!-part!filenameCounter!!extension!
if !lineCounter! gtr !limit! (
    set /a filenameCounter=!filenameCounter! + 1
    set lineCounter=1
    echo Created !splitFile!.
)
echo %%a>> !splitFile!

set /a lineCounter=!lineCounter! + 1
)

1 个答案:

答案 0 :(得分:1)

这是一种类似于您的方法,使用for /F loop来读取输入文件。但是,性能并不是很好,因为每个输出文件都会为每一行写入打开和关闭:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"   & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)

rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1"    & rem // (file name extension)

rem // Split file into multiple ones:
set "HEADER=" & set /A "INDEX=0, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
    rem // Read header if not done yet:
    if not defined HEADER (
        set "HEADER=%%L"
    ) else (
        set "LINE=%%L"
        rem // Compute line index, previous and current file count:
        set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1, INDEX+=1"
        rem // Write header once per output file:
        setlocal EnableDelayedExpansion
        if !PREV! lss !COUNT! (
            > "!NAME!_!COUNT!!EXT!" echo/!HEADER!
        )
        rem // Write line:
        >> "!NAME!_!COUNT!!EXT!" echo/!LINE!
        endlocal
    )
)

endlocal
exit /B

要完成任务,您甚至不需要for /F loop;相反,您可以set /Pinput redirection一起使用for /L loop,就像这样(请参阅所有解释性说明):

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"   & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)

rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1"    & rem // (file name extension)

rem // Determine number of lines excluding header:
for /F %%I in ('^< "%_FILE%" find /V /C ""') do set /A "COUNT=%%I-1"

rem // Split file into multiple ones:
setlocal EnableDelayedExpansion
rem // Read file once:
< "!_FILE!" (
    rem // Read header (first line):
    set /P HEADER=""
    rem // Calculate number of output files:
    set /A "DIV=(COUNT-1)/_LIMIT+1"
    rem // Iterate over output files:
    for /L %%J in (1,1,!DIV!) do (
        rem // Write an output file:
        > "!NAME!_%%J!EXT!" (
            rem // Write header:
            echo/!HEADER!
            rem // Write as many lines as specified:
            for /L %%I in (1,1,%_LIMIT%) do (
                set "LINE=" & set /P LINE=""
                if defined LINE echo/!LINE!
            )
        )
    )
)
endlocal

endlocal
exit /B

此方法的优点是输入文件以及每个输出文件仅打开一次。