Question

我有一个文本文件，结构为：

kotlin-maven-plugin

这是使用来自更大的txt文件的findstr创建的，但我只需要保存some_very_long_line部分

在Cockayne的幸福之地，我只想打字就像

attribute:: some_very_long_line

但由于some_very_long_line的内容非常长，并且CMD中任何命令行（或变量）的最大长度仅为8191个字符，因此我没有成功执行此过程。

由于一些不幸的原因，我必须在cmd中执行此操作。我知道如何使用sed或powershell在bash中执行此操作，但遗憾的是现在这不是一个选项。提前致谢

Answer 1

在纯批处理脚本中处理非常长的字符串是很麻烦的，但是可能。

set /P命令非常有用，因为在没有换行符时，它以1023字节/字符的块读取重定向的输入文件。 copy命令具有ASCII模式（/A），在该模式下，源文件最多被读取，但不包括第一个行尾（EOF）字符（也是SUB； ASCII 0x1A），被截断背后的一切可以（错误）用于安全地编写字符串而不会引起换行符（与重定向输出文件相反的echo）。

以下是使用以下命令的注释脚本：

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem /* Define constants here: */
set "_FILE=%~dp0my_preciouss.txt"      & rem /* (path to the target file) */
set "_DROP=attribute::"                & rem /* (string to strip; everything up to that
                                         rem     is omitted, so maybe `::` was enough?) */
set "_TMP1=%TEMP%\%~n0_1_%RANDOM%.tmp" & rem /* (path to a temporary file) */
set "_TMP2=%TEMP%\%~n0_2_%RANDOM%.tmp" & rem /* (path to a temporary file) */

rem /* Determine number of chunks of 1023 bytes: */
for %%F in ("%_FILE%") do 2> nul set /A "NUM=%%~zF, NUM/=1023"

rem /* Gather end-of-file (EOL) character by using `copy` in ASCII text mode: */
> nul copy /Y /A nul "%_TMP1%" & for /F "usebackq" %%S in ("%_TMP1%") do set "EOF=%%S"

rem /* Read from the given file: */
< "%_FILE%" (
    setlocal EnableDelayedExpansion
    rem /* Read first chunk of 1023 bytes from the file using `set /P` together with
    rem    and input redirection (`<`); remove everything up to the given drop string
    rem    from the chunk and write it plus a trailing EOL to a temporary file: */
    set /P LINE="" && > "!_TMP1!" (
        for /F "tokens=* eol= " %%L in (" !LINE:*%_DROP%=!!EOF!") do (
            endlocal & (echo(%%L) & setlocal EnableDelayedExpansion
        )
    )
    rem /* Read the remaining file chunk by chunk and append each to the temporary file,
    rem    using `copy` in ASCII mode, in which it treats the EOL (end-of-line) character
    rem    as such and truncates it and everything behind; that way, you can get rid of
    rem    the trailing line-break that the `echo` command appends as it is behind EOL: */
    for /L %%I in (0,1,%NUM%) do (
        set /P LINE="" && (
            > "!_TMP2!" echo(!LINE!!EOF!
            > nul copy /Y /A "!_TMP1!" + "!_TMP2!" "!_TMP1!"
        )
    )
    rem /* Append a final line-break (carriage-return plus line-feed) to the file;
    rem    if you do not need that, remove the whole `echo` command line and replace
    rem    `"!_TMP2!"` by `nul` in the `copy` command line: */
    > "!_TMP2!" echo/
    > nul copy /Y /A "!_TMP1!" + "!_TMP2!" "!_TMP1!" /B
    endlocal
)

rem /* Replace original file and clean up temporary files: */
> nul copy /Y /A "%_TMP1%" "%_FILE%" /B && 2> nul del "%_TMP1%" "%_TMP2%"

endlocal
exit /B

Answer 2

你可以用Python或C ++编写一个简单的界面（或者实际上任何允许你给它命令行args的语言）来做你需要的，调用它就像这样：

shortfile "my_preciouss.txt"

其中shortfile是您在上面写的实用程序，文件名（显然）是您希望缩短的文件。

Answer 3

正如您所发现的那样，本地批处理对于进行文本操作非常糟糕。

理论上可以用＆＃34; pure＆＃34;来解决这个问题。批处理和本机外部命令。最简单的＆＃34;可能是使用CERTUTIL将文件写为十六进制。然后扫描十六进制并写入包含第一个空格（十六进制20）之后的值的新十六进制文件。然后使用CERTUTIL转换回ASCII。但我无法忍受采取这种痛苦的做法。

签出JREPL.BAT - 一个正则表达式查找/替换实用程序，编写为纯脚本（混合JScript /批处理），可以在任何现代Windows机器上从XP开始本地运行。

使用jrepl /?help查看所有类型的内置帮助。 jrepl /?options会为您提供所有可用选项的快速摘要。

根据您的模糊描述，我看到了几种处理文件的方法。

call jrepl "^attribute:: " "" /f "my_preciouss.txt" /o -

或

call jrepl "^.*? +" "" /f "my_preciouss.txt" /o -

但是如果你对sed感到满意，我相信你可以拿出自己的解决方案。

我怀疑你可以完全避免这种清理操作，如果你放弃FINDSTR并使用JREPL来生成正确的文件。

请注意，CALL会将所引用的^加倍，但这对于行首锚不会有问题，因为^^search等同于^search。但它可能导致像[^xyz]这样的问题。如果直接在命令行上执行命令，则可以删除CALL并避免此问题。但是，如果在批处理脚本中，您可以通过添加/xseq选项并将\c替换为^来避免插入符号加倍问题。

修剪批处理文件中的大行

3 个答案: