创建批处理文件以按不同的ID

时间:2018-04-05 15:38:14

标签: powershell csv batch-file split

我有一个巨大的CSV文件(data.csv),我需要通过一定数量的不同ID值(而非按行)拆分成小型CSV文件,并确保保留每个ID的所有记录。而且我还需要确保保留标题。例如,这是原始文件:

ID    Date   
1     01/01/2010
1     02/01/2010
2     01/01/2010
2     05/01/2010
2     06/01/2010
3     06/01/2010
3     07/01/2010
4     08/01/2010
4     09/01/2010

如果我在每两个不同的ID值之后拆分文件,我应该看到data_1.csv中的前5条记录和data_2.csv中的最后4条记录。

我的代码是.bat,只按行数拆分。我不知道如何修改它,我愿意考虑其他选项,比如PowerShell。

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~dp0data.csv"   & rem // (first command line argument is   input    file)
set /A "_LIMIT=5" & rem // (number of records or rows per output file)

rem // Split file name:
set "NAME=data" & rem // (path and file name)
set "EXT=%~x1.csv"    & rem // (file name extension)

rem // Split file into multiple ones:
set "HEADER=" & set /A "INDEX=0, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
    rem // Read header if not done yet:
    if not defined HEADER (
        set "HEADER=%%L"
    ) else (
        set "LINE=%%L"
        rem // Compute line index, previous and current file count:
        set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1, INDEX+=1"
        rem // Write header once per output file:
        setlocal EnableDelayedExpansion
        >&2 echo !INDEX!; !PREV!, !COUNT!
        if !PREV! lss !COUNT! (
            > "!NAME!_!COUNT!!EXT!" echo/!HEADER!
        )
        rem // Write line:
        >> "!NAME!_!COUNT!!EXT!" echo/!LINE!
        endlocal
    )
)

endlocal
exit /b

3 个答案:

答案 0 :(得分:1)

假设您想在每个输出文件中写入一定数量的不同ID个数字,并且输入文件data.csv已按照示例数据中的说明对这些值进行排序,则以下批处理文件可以正常工作为你:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"   & rem // (first command line argument is input file)
set /A "_LIMIT=2" & rem // (number of distinct values in first column per output file)

rem // Split file name:
set "NAME=%~dpn1" & rem // (path and file name)
set "EXT=%~x1"    & rem // (file name extension)

rem // Split file into multiple ones:
set "HEADER=" & set "OLD=" & set /A "INDEX=-1, COUNT=0"
rem // Read file once:
for /F "usebackq delims=" %%L in ("%_FILE%") do (
    rem // Read header if not done yet:
    if not defined HEADER (
        set "HEADER=%%L"
    ) else (
        set "LINE=%%L"
        rem // Split off value in first column:
        for /F "tokens=1" %%I in ("%%L") do (
            set "NEW=%%I"
            rem // Compute value index:
            setlocal EnableDelayedExpansion
            if not "!NEW!"=="!OLD!" (
                endlocal
                set /A "INDEX+=1"
            ) else endlocal
            rem // Compute previous and current file count:
            set /A "PREV=COUNT, COUNT=INDEX/_LIMIT+1"
            setlocal EnableDelayedExpansion
            rem // Write header once per output file:
            if !PREV! lss !COUNT! (
                > "!NAME!_!COUNT!!EXT!" echo/!HEADER!
            )
            rem // Write line:
            >> "!NAME!_!COUNT!!EXT!" echo/!LINE!
            endlocal
            set "OLD=%%I"
        )
    )
)

endlocal
exit /B

答案 1 :(得分:1)

您提供的代码与您描述的问题无关系,因此将其用作基础并没有多大意义...

下面的批处理文件执行您在问题描述中请求的内容:

编辑代码已修改为使用分号作为分隔符

@echo off
setlocal EnableDelayedExpansion

set "distinctIDs=2"

set "lastID="
set /A "newIDs=-1, file=0"
for /F "tokens=1,2 delims=;" %%a in (data.csv) do (
   if not defined header (
      set "header=%%a;%%b"
   ) else (
      if "%%a" neq "!lastID!" (
         set "lastID=%%a"
         set /A newIDs+=1, newFile=newIDs%%distinctIDs
         if !newFile! equ 0 (
            set /A file+=1
            > data_!file!.csv echo !header!
         )
      )
      >> data_!file!.csv echo %%a;%%b
   )
)

答案 2 :(得分:0)

在我看来,所有这些文件必须是TAB或SPACE分隔才能使所有这些.bat文件都能正常工作。如果文件是&#34 ;;"分隔,然后(1)我们应该先取代";"使用TAB和(2)运行aschipfl或Aacini的代码。两者都使用.txt TAB分隔文件。这是执行第(1)部分的代码:

@echo off
setlocal enableextensions enabledelayedexpansion

rem Get a tab character
for /f tokens^=^*^ delims^= %%t in ('forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"') do set "tab=%%t" 

rem For each line in text file, replace ; with a tab    
(for /f "tokens=*" %%l in (data_new.txt) do (
    set "line=%%l"  
    echo !line:;=%tab%!
)) > data_new_tab.txt

endlocal