我有一个巨大的CSV文件,我需要将其分成小的CSV文件,在每个文件中保留标题,并确保保留所有记录。例如,这是原始文件:
ID Date
1 01/01/2010
1 02/01/2010
2 01/01/2010
2 05/01/2010
2 06/01/2010
3 06/01/2010
3 07/01/2010
4 08/01/2010
4 09/01/2010
如果我正确拆分文件,我应该看到data_1.csv中的前5条记录和data_2.csv中的最后4条记录。
代码我只按行拆分而不保留标题。我不知道如何修改它:
@echo off
setLocal EnableDelayedExpansion
set limit=5
set file=data.csv
set lineCounter=1
set filenameCounter=1
set name=
set extension=
for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
)
for /f "tokens=*" %%a in (%file%) do (
set splitFile=!name!-part!filenameCounter!!extension!
if !lineCounter! gtr !limit! (
set /a filenameCounter=!filenameCounter! + 1
set lineCounter=1
echo Created !splitFile!.
)
echo %%a>> !splitFile!
set /a lineCounter=!lineCounter! + 1
)
答案 0 :(得分:1)
这是一种类似于您的方法,使用for /F
loop来读取输入文件。但是,性能并不是很好,因为每个输出文件都会为每一行写入打开和关闭:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)
rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1" & rem // (file name extension)
rem // Split file into multiple ones:
set "HEADER=" & set /A "INDEX=0, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
rem // Read header if not done yet:
if not defined HEADER (
set "HEADER=%%L"
) else (
set "LINE=%%L"
rem // Compute line index, previous and current file count:
set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1, INDEX+=1"
rem // Write header once per output file:
setlocal EnableDelayedExpansion
if !PREV! lss !COUNT! (
> "!NAME!_!COUNT!!EXT!" echo/!HEADER!
)
rem // Write line:
>> "!NAME!_!COUNT!!EXT!" echo/!LINE!
endlocal
)
)
endlocal
exit /B
要完成任务,您甚至不需要for /F
loop;相反,您可以set /P
与input redirection一起使用for /L
loop,就像这样(请参阅所有解释性说明):
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)
rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1" & rem // (file name extension)
rem // Determine number of lines excluding header:
for /F %%I in ('^< "%_FILE%" find /V /C ""') do set /A "COUNT=%%I-1"
rem // Split file into multiple ones:
setlocal EnableDelayedExpansion
rem // Read file once:
< "!_FILE!" (
rem // Read header (first line):
set /P HEADER=""
rem // Calculate number of output files:
set /A "DIV=(COUNT-1)/_LIMIT+1"
rem // Iterate over output files:
for /L %%J in (1,1,!DIV!) do (
rem // Write an output file:
> "!NAME!_%%J!EXT!" (
rem // Write header:
echo/!HEADER!
rem // Write as many lines as specified:
for /L %%I in (1,1,%_LIMIT%) do (
set "LINE=" & set /P LINE=""
if defined LINE echo/!LINE!
)
)
)
)
endlocal
endlocal
exit /B
此方法的优点是输入文件以及每个输出文件仅打开一次。