是否可以使用批处理文件或powershell从CSV中删除回车/换行而不删除每条记录的自然末尾。
基本上我有一个这样的文件:
a1, a2, a3, a4,aaa
aaa a5, a6, a7,aaa aa
a8
b1,b2,b3,b4,b5,b6,b7,b8
c1,c2,c3,c4,c5,c6,c7,c8
d1,d2,d3,d4,d5,d6,d7,d8
e1,e2,e3,e4,eee
e5,e6,e7,e8
例如,第5列和第8列“可能”包含回车符/换行符。我想删除这些,所以文件是1行= 1记录。
这可能吗?我已经使用批处理文件格式化文件,所以如果可能的话,我想将它用于所有格式化。我正在考虑转移到powershell,所以如果它更容易,请告诉我(绝对的powershell noob)。
NP 编辑 - 每行具有相同的列数。在这个例子中,8。
答案 0 :(得分:2)
整蛊,但我不得不承受一个很好的挑战......虽然你没有表现出任何努力来解决它......
这是一个结合了CSV数据行的脚本,以防元素数量不符合预定义的数据。它不单独处理元素,它只是附加行以达到建议的数字。数据不得包含任何全局通配符,例如*
和?
。除非它们加倍""
,否则也不应出现任何引号。这是:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "FILE_I=%~1" & rem // (specifies the input CSV file)
set "FILE_O=%~2" & rem // (specifies the output CSV file)
set "SEPARATOR=," & rem // (is the separator used in the CSV data)
set "REPLACE=" & rem // (is the relacement string for each line-break)
set "NUMITEMS=8" & rem // (is the proposed number of elements per line)
rem // Validate given input and output CSV files:
if not exist "%FILE_I%" (< "%FILE_I%" set /P ="" & exit /B 1)
if not defined FILE_O set "FILE_O=con"
rem // Initialise data collector and counter for elements:
set "PREV=" & set /A "COUNT=0"
rem // Iterate through lines of input file:
for /F delims^=^ eol^= %%L in ('
rem/ /* Read input file, output dummy line and deplete output file: */ ^& ^
type "%FILE_I%" ^& ^> "%FILE_O%" break ^& echo/^& ^
for /L %%J in ^(2^,1^,%NUMITEMS%^) do @^< nul set /P ^=","
') do (
rem // Store currently read line:
set "LINE=%%L"
rem // Toggle delayed expansion in order not to lose `!`:
setlocal EnableDelayedExpansion
rem // Add number of elements of current line to the counter:
for %%I in ("!LINE:%SEPARATOR%=","!") do (
endlocal
set /A "COUNT+=1"
setlocal EnableDelayedExpansion
)
rem // Check whether counter reached given number of elements per line:
if !COUNT! LEQ %NUMITEMS% (
rem /* Either proposed number of elements not reached, hence store data
rem and wait for next line to have enough elements;
rem or number is reached but still wait for the next line, because it
rem could be a single element to be appended to the previous line;
rem hence the data output is actually delayed by one loop iteration;
rem so to not lose the last line, the said dummy line is needed: */
set "PREV=!PREV!%REPLACE%!LINE!"
rem // Transport data collector over `endlocal` barrier:
for /F delims^=^ eol^= %%K in ("!PREV!") do (
endlocal
set "PREV=%%K"
setlocal EnableDelayedExpansion
)
rem /* Decrement counter because a single element is considered
rem to be part of the last element of the previous line: */
endlocal
set /A "COUNT-=1"
setlocal EnableDelayedExpansion
) else (
rem /* Proposed number of elements exceeded, hence output currently
rem collected data, reset collector and counter for elements: */
if defined REPLACE set "PREV=!PREV:*%REPLACE%=!"
>> "%FILE_O%" echo !PREV!
endlocal
rem // Store current line in data collector and subtract
rem the number of output elements from counter: */
set "PREV=%REPLACE%%%L"
set /A "COUNT-=%NUMITEMS%"
setlocal EnableDelayedExpansion
)
endlocal
)
endlocal
exit /B
假设脚本保存为concat-csv-lines.bat
,输入CSV文件名为broken-lines.csv
,输出文件为concatenated.csv
,请通过以下命令行运行:
concat-csv-lines.bat broken-lines.csv concatenated.csv
如果broken-lines.csv
包含问题中的示例数据,concatenated.csv
将会保留:
a1, a2, a3, a4,aaaaaa a5, a6, a7,aaa aaa8 b1,b2,b3,b4,b5,b6,b7,b8 c1,c2,c3,c4,c5,c6,c7,c8 d1,d2,d3,d4,d5,d6,d7,d8 e1,e2,e3,e4,eeee5,e6,e7,e8
答案 1 :(得分:0)
我添加了另一个列(现在是九个),因为它不起作用,在最后一个令牌中有一个“in-line-CRLF”(并且你声称,令牌8可能有一个)。 (我明白,你有影响力来创建csv
文件)。代码为REM
的描述。
@echo off
setlocal enabledelayedexpansion
REM emty variable:
set "line="
for /f "delims=" %%a in (t.csv) do (
REM append line from file to variable
set "line=!line! %%a"
REM rescue spaces (by replacing with another character)
REM for proper token counting
set "line=!line: =²!"
set n=0
REM count tokens:
for %%b in (!line!) do set /a n+=1
if !n! geq 9 (
REM if 9 (or more) tokens, the assembly is finished.
REM re-replace the spaces
set "line=!line:²= !"
REM cut the first char (a space):
set "line=!line:~1!"
REM output the line:
echo !line!
REM and clear the variable for the next logical line:
set "line="
)
)
如果某个行的元素数超过<n>
,则会有一些容差,但如果行数较少,则会失败。