我有一个巨大的CSV文件(data.csv),我需要通过一定数量的不同ID值(而非按行)拆分成小型CSV文件,并确保保留每个ID的所有记录。而且我还需要确保保留标题。例如,这是原始文件:
ID Date 1 01/01/2010 1 02/01/2010 2 01/01/2010 2 05/01/2010 2 06/01/2010 3 06/01/2010 3 07/01/2010 4 08/01/2010 4 09/01/2010
如果我在每两个不同的ID值之后拆分文件,我应该看到data_1.csv中的前5条记录和data_2.csv中的最后4条记录。
我的代码是.bat,只按行数拆分。我不知道如何修改它,我愿意考虑其他选项,比如PowerShell。
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~dp0data.csv" & rem // (first command line argument is input file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)
rem // Split file name:
set "NAME=data" & rem // (path and file name)
set "EXT=%~x1.csv" & rem // (file name extension)
rem // Split file into multiple ones:
set "HEADER=" & set /A "INDEX=0, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
rem // Read header if not done yet:
if not defined HEADER (
set "HEADER=%%L"
) else (
set "LINE=%%L"
rem // Compute line index, previous and current file count:
set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1, INDEX+=1"
rem // Write header once per output file:
setlocal EnableDelayedExpansion
>&2 echo !INDEX!; !PREV!, !COUNT!
if !PREV! lss !COUNT! (
> "!NAME!_!COUNT!!EXT!" echo/!HEADER!
)
rem // Write line:
>> "!NAME!_!COUNT!!EXT!" echo/!LINE!
endlocal
)
)
endlocal
exit /b
答案 0 :(得分:1)
假设您想在每个输出文件中写入一定数量的不同ID
个数字,并且输入文件data.csv
已按照示例数据中的说明对这些值进行排序,则以下批处理文件可以正常工作为你:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (first command line argument is input file)
set /A "_LIMIT=2" & rem // (number of distinct values in first column per output file)
rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1" & rem // (file name extension)
rem // Split file into multiple ones:
set "HEADER=" & set "OLD=" & set /A "INDEX=-1, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
rem // Read header if not done yet:
if not defined HEADER (
set "HEADER=%%L"
) else (
set "LINE=%%L"
rem // Split off value in first column:
for /F "tokens=1" %%I in ("%%L") do (
set "NEW=%%I"
rem // Compute value index:
setlocal EnableDelayedExpansion
if not "!NEW!"=="!OLD!" (
endlocal
set /A "INDEX+=1"
) else endlocal
rem // Compute previous and current file count:
set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1"
setlocal EnableDelayedExpansion
rem // Write header once per output file:
if !PREV! lss !COUNT! (
> "!NAME!_!COUNT!!EXT!" echo/!HEADER!
)
rem // Write line:
>> "!NAME!_!COUNT!!EXT!" echo/!LINE!
endlocal
set "OLD=%%I"
)
)
)
endlocal
exit /B
答案 1 :(得分:1)
您提供的代码与您描述的问题无关系,因此将其用作基础并没有多大意义...
下面的批处理文件执行您在问题描述中请求的内容:
编辑:代码已修改为使用分号作为分隔符
@echo off
setlocal EnableDelayedExpansion
set "distinctIDs=2"
set "lastID="
set /A "newIDs=-1, file=0"
for /F "tokens=1,2 delims=;" %%a in (data.csv) do (
if not defined header (
set "header=%%a;%%b"
) else (
if "%%a" neq "!lastID!" (
set "lastID=%%a"
set /A newIDs+=1, newFile=newIDs%%distinctIDs
if !newFile! equ 0 (
set /A file+=1
> data_!file!.csv echo !header!
)
)
>> data_!file!.csv echo %%a;%%b
)
)
答案 2 :(得分:0)
在我看来,所有这些文件必须是TAB或SPACE分隔才能使所有这些.bat文件都能正常工作。如果文件是&#34 ;;"分隔,然后(1)我们应该先取代";"使用TAB和(2)运行aschipfl或Aacini的代码。两者都使用.txt TAB分隔文件。这是执行第(1)部分的代码:
@echo off
setlocal enableextensions enabledelayedexpansion
rem Get a tab character
for /f tokens^=^*^ delims^= %%t in ('forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"') do set "tab=%%t"
rem For each line in text file, replace ; with a tab
(for /f "tokens=*" %%l in (data_new.txt) do (
set "line=%%l"
echo !line:;=%tab%!
)) > data_new_tab.txt
endlocal