如何从CSV中删除回车/换行 - 每行末尾除外?

时间:2016-08-24 13:58:20

标签: csv batch-file formatting

是否可以使用批处理文件或powershell从CSV中删除回车/换行而不删除每条记录的自然末尾。

基本上我有一个这样的文件:

a1, a2, a3, a4,aaa
aaa a5, a6, a7,aaa aa
a8
b1,b2,b3,b4,b5,b6,b7,b8
c1,c2,c3,c4,c5,c6,c7,c8
d1,d2,d3,d4,d5,d6,d7,d8
e1,e2,e3,e4,eee
e5,e6,e7,e8

例如,第5列和第8列“可能”包含回车符/换行符。我想删除这些,所以文件是1行= 1记录。

这可能吗?我已经使用批处理文件格式化文件,所以如果可能的话,我想将它用于所有格式化。我正在考虑转移到powershell,所以如果它更容易,请告诉我(绝对的powershell noob)。

NP 编辑 - 每行具有相同的列数。在这个例子中,8。

2 个答案:

答案 0 :(得分:2)

整蛊,但我不得不承受一个很好的挑战......虽然你没有表现出任何努力来解决它......

这是一个结合了CSV数据行的脚本,以防元素数量不符合预定义的数据。它不单独处理元素,它只是附加行以达到建议的数字。数据不得包含任何全局通配符,例如*?。除非它们加倍"",否则也不应出现任何引号。这是:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "FILE_I=%~1"  & rem // (specifies the input CSV file)
set "FILE_O=%~2"  & rem // (specifies the output CSV file)
set "SEPARATOR=," & rem // (is the separator used in the CSV data)
set "REPLACE="    & rem // (is the relacement string for each line-break)
set "NUMITEMS=8"  & rem // (is the proposed number of elements per line)

rem // Validate given input and output CSV files:
if not exist "%FILE_I%" (< "%FILE_I%" set /P ="" & exit /B 1)
if not defined FILE_O set "FILE_O=con"

rem // Initialise data collector and counter for elements:
set "PREV=" & set /A "COUNT=0"
rem // Iterate through lines of input file:
for /F delims^=^ eol^= %%L in ('
    rem/ /* Read input file, output dummy line and deplete output file: */ ^& ^
        type "%FILE_I%" ^& ^> "%FILE_O%" break ^& echo/^& ^
        for /L %%J in ^(2^,1^,%NUMITEMS%^) do @^< nul set /P ^=","
') do (
    rem // Store currently read line:
    set "LINE=%%L"
    rem // Toggle delayed expansion in order not to lose `!`:
    setlocal EnableDelayedExpansion
    rem // Add number of elements of current line to the counter:
    for %%I in ("!LINE:%SEPARATOR%=","!") do (
        endlocal
        set /A "COUNT+=1"
        setlocal EnableDelayedExpansion
    )
    rem // Check whether counter reached given number of elements per line:
    if !COUNT! LEQ %NUMITEMS% (
        rem /* Either proposed number of elements not reached, hence store data
        rem    and wait for next line to have enough elements;
        rem    or number is reached but still wait for the next line, because it
        rem    could be a single element to be appended to the previous line;
        rem    hence the data output is actually delayed by one loop iteration;
        rem    so to not lose the last line, the said dummy line is needed: */
        set "PREV=!PREV!%REPLACE%!LINE!"
        rem // Transport data collector over `endlocal` barrier:
        for /F delims^=^ eol^= %%K in ("!PREV!") do (
            endlocal
            set "PREV=%%K"
            setlocal EnableDelayedExpansion
        )
        rem /* Decrement counter because a single element is considered
        rem    to be part of the last element of the previous line: */
        endlocal
        set /A "COUNT-=1"
        setlocal EnableDelayedExpansion
    ) else (
        rem /* Proposed number of elements exceeded, hence output currently
        rem    collected data, reset collector and counter for elements: */
        if defined REPLACE set "PREV=!PREV:*%REPLACE%=!"
        >> "%FILE_O%" echo !PREV!
        endlocal
        rem // Store current line in data collector and subtract
        rem    the number of output elements from counter: */
        set "PREV=%REPLACE%%%L"
        set /A "COUNT-=%NUMITEMS%"
        setlocal EnableDelayedExpansion
    )
    endlocal
)

endlocal
exit /B

假设脚本保存为concat-csv-lines.bat,输入CSV文件名为broken-lines.csv,输出文件为concatenated.csv,请通过以下命令行运行:

concat-csv-lines.bat broken-lines.csv concatenated.csv

如果broken-lines.csv包含问题中的示例数据,concatenated.csv将会保留:

a1, a2, a3, a4,aaaaaa a5, a6, a7,aaa aaa8
b1,b2,b3,b4,b5,b6,b7,b8
c1,c2,c3,c4,c5,c6,c7,c8
d1,d2,d3,d4,d5,d6,d7,d8
e1,e2,e3,e4,eeee5,e6,e7,e8

答案 1 :(得分:0)

我添加了另一个列(现在是九个),因为它不起作用,在最后一个令牌中有一个“in-line-CRLF”(并且你声称,令牌8可能有一个)。 (我明白,你有影响力来创建csv文件)。代码为REM的描述。

@echo off 
setlocal enabledelayedexpansion
REM emty variable: 
set "line="
for /f "delims=" %%a in (t.csv) do (
  REM append line from file to variable
  set "line=!line! %%a"
  REM rescue spaces (by replacing with another character)
  REM for proper token counting
  set "line=!line: =²!"
  set n=0
  REM count tokens:
  for %%b in (!line!) do set /a n+=1
  if !n! geq 9 (
    REM if 9 (or more) tokens, the assembly is finished.
    REM re-replace the spaces
    set "line=!line:²= !"
    REM cut the first char (a space):
    set "line=!line:~1!"
    REM output the line:
    echo !line!
    REM and clear the variable for the next logical line:
    set "line="
  ) 
)

如果某个行的元素数超过<n>,则会有一些容差,但如果行数较少,则会失败。