Question

我有一个带有分号分隔符（CSV）的文本文件，它有65列，但最后一个是“注释”列，其内容中可以包含分号。
我想知道如何编写一个Windows批处理文件来计算每一行中的分号，如果找到> 64个分号，
删除64之后的所有分号（或用逗号更改）。（我无权访问构建文本文件的源代码）

实际示例：

marshal;Stevens;Son;11223344;Dual;this person tries food; water; fruit

预期输出：

marshal;Stevens;Son;11223344;Dual;this person tries food, water, fruit

Answer 1

@ECHO OFF
SETLOCAL 
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q56171667.txt"
SET "outfile=%destdir%\outfile.txt"
(
FOR /f "usebackqtokens=1*delims=" %%a IN ("%filename1%") DO (
 SET "line=%%a"
 CALL :lop64
)
)>"%outfile%"

GOTO :EOF

:: remove the first 64 ;-terminated strings from LINE
:: remove remaining `;`

:lop64
SET /a lopoff=64
SET "original=%line%"
:lop64lp
SET "line=%line:*;=%"
SET /a lopoff-=1
IF %lopoff% gtr 0 GOTO lop64lp
CALL ECHO %%original:;%line%=%%;%line:;=%
GOTO :eof

您需要根据自己的情况更改sourcedir和destdir的设置。

我使用了一个名为q56171667.txt的文件，其中包含一些虚拟数据用于测试。

产生定义为％outfile％的文件

usebackq选项是必需的，因为我选择在源文件名周围添加引号。

批量处理文本是一个雷区。使用针对任务设计的sed或(g)awk可能会更好。

我假设缺少信息，否则您的数据不包含具有特殊含义的字符，例如%或"或&或|或<或>等。

基本上，将整个读取的行分配给line，然后在子例程中，用 nothing 替换每个字符串，直到;（:*;=）64次。然后显示原始字符串，line中的其余部分用 nothing 替换为;，line的其余部分用;替换为< em> nothing （:;=）。如果要用逗号代替分号，请在此处使用:;=,。

Answer 2

尽管您没有付出自己的努力来解决任务，但我还是决定提供一些代码，因为这是一个很好的挑战–请参阅所有说明性的rem备注：

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"  & rem // (input file; `%~1` is the first command line argument)
set "_SEP=;"     & rem // (original separator to be replaced)
set "_NEW=,"     & rem // (new separator to replace the old one with)
set /A "_LIM=64" & rem // (number of first original separators to be kept)

rem // Read input file line by line:
for /F usebackq^ delims^=^ eol^= %%L in ("%_FILE%") do (
    rem // Store current line, reset some auxiliary variables:
    set "LINE=%%L" & set "COLL=" & set /A "CNT=-1"
    setlocal EnableDelayedExpansion
    rem // Handle the case when no original separator is defined:
    if defined _SEP (
        rem // Iterate through all separated items of the current line:
        for %%I in ("!LINE:%_SEP%=" "!") do (
            rem // Support loop to transport `COLL` variable over `endlocal` barrier:
            for /F "delims=" %%J in (""!COLL!"") do (
                endlocal
                rem /* Store currently iterated item, increment item counter and
                rem    store rebuilt line with separators replaced as defined: */
                set "ITEM=%%~I" & set /A "CNT+=1" & set "COLL=%%~J"
                setlocal EnableDelayedExpansion
                rem // Check whether or not to exclude current separator:
                if !CNT! gtr %_LIM% (
                    set "COLL=!COLL!!_NEW!!ITEM!"
                ) else if !CNT! gtr 0 (
                    set "COLL=!COLL!!_SEP!!ITEM!"
                ) else set "COLL=!ITEM!"
            )
        )
        rem // Return rebuilt line with separators replaced as defined:
        echo(!COLL!
    ) else echo(!LINE!
    endlocal
)

endlocal
exit /B

鉴于脚本已另存为repl-sep.bat，输入文件名为1.csv，请按以下方式运行脚本：

repl-sep.bat "1.csv"

要将输出存储到名为2.csv的文件中，而不是将其显示在控制台中，请使用以下命令行：

repl-sep.bat "1.csv" > "2.csv"

请注意，输入文件中不得出现以下字符：?，*，<，>，"。

Answer 3

理想情况下，您将使用for /f读取文件并使用"tokens=64*delims=;"
得到行的其余部分，并在那里只替换分号。

遗憾的是，最大令牌值是31（加上*余数），因此您必须嵌套多个for /F

:: Q:\Test\2019\05\16\SO_56171667.cmd
@Echo off
Set "FileIn=Col65.csv"
Set "FileOut=NewCol65.csv"

(    for /F "usebackq  delims="  %%a in ("%FileIn%"
) do for /F "tokens=31*delims=;" %%b in ("%%a"
) do for /F "tokens=31*delims=;" %%d in ("%%c"
) do for /f "tokens=2* delims=;" %%f in ("%%e"
) do Call :Sub "%%a" "%%g"
)>"%FileOut%"

Goto :Eof
:Sub
Set "Line=%~1#"
Set "Col65=%~2"
Set "Col65=%Col65:;=,%"
Call Echo:%%Line:%~2#=%Col65%%%

除了加载时间外，对于较大的文件，此PowerShell脚本可能会更快：

## Q:\Test\2019\05\16\SO_56171667_2.ps1
$FileIn  = 'Col65.csv'
$FileOut = 'NewCol65.csv'
Get-Content $FileIn | ForEach-Object{
  $Cols = $_ -split ';',65
  $Cols[-1] = $Cols[-1].Replace(';',',')
  $Cols -join ';'
} | Set-Content $FileOut

要分批处理：

:: Q:\Test\2019\05\16\SO_56171667_2.cmd
@Echo off
Set "FileIn=Col65.csv"
Set "Fileout=NewCol65.csv"

powershell -NoP -C "gc '%FileIn%'|ForEach-Object{$Cols=$_ -split ';',65;$Cols[-1]=$Cols[-1].Replace(';',',');$Cols -join ';'}|Set-Content '%FileOut%'"

如何在Windows命令行中逐行查找和替换文本文件中的字符，但从特定字符出现计数？

3 个答案: