Question

这个想法是获取出现404错误的URL及其上方的ID，以指示该URL属于它们，并进一步查找文件名文本并添加到输出文件中。

我一直在尝试循环findSTR以从先前找到的行号中找到行。有人可以帮忙吗？

示例文件：

FileName:  LastABC-1563220.xml
-------------------------------
123456786
12348
1234DEF
-------------------------------
http://Product.com/1234DEF
HTTP/1.1 404 Not Found - 0.062000
http://Product.com/1234DEF_1
HTTP/1.1 200 OK - 0.031000
123456785
12349
1234EFG
-------------------------------
http://Product.com/1234EFG
HTTP/1.1 200 OK - 0.031000
123456784
12340
1234FGH
-------------------------------
http://Product.com/1234FGH
HTTP/1.1 200 OK - 0.031000
http://Product.com/1234FGH_1
HTTP/1.1 404 Not Found - 0.079000
http://Product.com/1234FGH_2
HTTP/1.1 404 Not Found - 0.067000
http://Product.com/1234FGH_4
HTTP/1.1 404 Not Found - 0.047000

所需的输出：

FileName:  LastABC-1563220.xml
123456786 12348 1234DEF
http://Product.com/1234DEF

123456784 12340 1234FGH
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4

到目前为止我拥有的脚本：

del "%FailingURLS%" 2>nul
    set numbers=
        for /F "delims=:" %%a in ('findstr /I /N /C:"404 Not Found" %Formatedfile%') do (
            set /A before=%%a-1
            set "numbers=!numbers!!before!: "
        )
        (for /F "tokens=1* delims=:" %%a in ('findstr /N "^" %Formatedfile% ^| findstr /B "%numbers%"') do echo %%b) > %FailingURLS%

Answer 1

这是我要这样做的方式：

@echo off
setlocal EnableDelayedExpansion

del PreviousLines.txt 2>nul
set "ids="
(for /F "delims=" %%a in (test.txt) do (
   set "line=%%a"
   if "!line:~0,9!" equ "FileName:" (
      echo(!line!>> PreviousLines.txt
   ) else if "!line:~0,5!" equ "http:" (
      if defined ids echo(!ids!>> PreviousLines.txt
      set "ids="
      echo(!line!>> PreviousLines.txt
   ) else if "!line:~0,4!" equ "HTTP" (
      rem It is an "OK" or "Not Found" line...
      rem If is "Not Found", show previous lines
      if "!line:Not Found=!" neq "!line!" type PreviousLines.txt
      rem Anyway, reset previous lines
      del PreviousLines.txt 2>nul
      set "ids="
   ) else if "!line:~0,5!" neq "-----" (
      set "ids=!ids!!line! "
   )
)) > FailingURLS.txt

输出：

FileName:  LastABC-1563220.xml
123456786 12348 1234DEF 
http://Product.com/1234DEF
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4

我不明白您为什么在123456784 12340 1234FGH之前显示http://Product.com/1234FGH_1 ID，因为这样的ID属于http://Product.com/1234FGH可以...

Answer 2

您的问题就目前而言太广泛了，因此以下示例显示了一种从文件中检索“ 404” URL的方法，我认为这是您的主要问题。

@Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
Set "Src=formattedfile.txt"
Set "Str=404 Not Found"
(Set LF=^
% 0x0A %
)
For /F %%A In ('Copy /Z "%~f0" Nul')Do Set "CR=%%A"
SetLocal EnableDelayedExpansion
FindStr /RC:".*!CR!*!LF!.*%Str%" "%Src%"
EndLocal
Pause

只需修改3行中的值以匹配格式文本文件的名称

您提供的文件内容的输出：

http://Product.com/1234DEF
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4
Press any key to continue . . .

Answer 3

以下是一个脚本（我们称其为extract-failed-urls.bat），它演示了完成任务的一种可能方法-带有一些解释性的rem注释，可帮助您了解会发生什么情况：

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"      & rem // (`%~1` represents the first command line argument)
set "_URLP=://"      & rem // (partial string that every listed URL contains)
set "_RESP=HTTP/1.1" & rem // (partial string that every response begins with)
set "_ERRN=404"      & rem // (specific error number in response to recognise)

rem // Determine the total number of lines contained in the given file:
(for /F %%C in ('^< "%_FILE%" find /C /V ""') do set "CNT=%%C") || goto :EOF
rem // Read from the given file:
< "%_FILE%" (
    rem // Clear IDs and URL buffer, and preset flag:
    set "IDS=" & set "URL=" & set "FLAG=#"
    setlocal EnableDelayedExpansion
    rem // Read and write first line of file separately:
    set /A "CNT-=1" & set "LINE=" & set /P LINE="" & < nul set /P ="!LINE!"
    rem // Loop through the remaining lines:
    for /L %%I in (1,1,!CNT!) do (
        rem // Read a line and process only non-empty ones:
        set /P LINE="" && (
            rem // Try to split off response prefix:
            set "REST=!LINE:*%_RESP% =!"
            rem // Determine kind of current line:
            if "!LINE:-=!" == "" (
                rem // Line contains only hyphens `-`, so clear URL buffer:
                set "URL="
            ) else if not "!LINE!" == "!LINE:*%_URLP%=!" (
                rem // Line contains an URL, so store to URL buffer, set flag:
                set "URL=!LINE!" & set "FLAG=#"
            ) else if "!LINE!" == "%_RESP% !REST!" (
                rem // Line contains a response, so gather number:
                for /F %%R in ("!REST!") do (
                    rem /* Specific error encountered, hence write IDs, if any,
                    rem    clear IDs buffer, then write stored URL, if any: */
                    if "%%R" == "%_ERRN%" (
                        if defined IDS echo/& echo(!IDS!
                        set "IDS=" & if defined URL echo(!URL!
                    )
                )
                rem // Clear URL buffer and set flag:
                set "URL=" & set "FLAG=#"
            ) else (
                rem /* No other condition fulfilled, hence line contains an ID,
                rem    so put ID into IDs buffer, clear URL buffer and flag: */
                if defined FLAG (set "IDS=!LINE!") else set "IDS=!IDS! !LINE!"
                set "URL=" & set "FLAG="
            )
        )
    )
    endlocal
)

endlocal
exit /B

要针对名为sample.txt的输入文件运行它，请使用如下命令行：

extract-failed-urls.bat "sample.txt"

要将输出写入名为failed-urls.txt的另一个文件，请使用以下方法：

extract-failed-urls.bat "sample.txt" > "failed-urls.txt"

使用问题中样本输入文件中的数据，输出如下：

FileName:  LastABC-1563220.xml
123456786 12348 1234DEF
http://Product.com/1234DEF

123456784 12340 1234FGH
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4

这种方法区分以下几种不同类型的输入线，它们的识别会触发某些相应的活动：

第一行（以FileName:开头的行）：
- 仅输出未编辑的行（不带换行符）；
仅包含连字符（-------------------------------）的行：
- 清除保存（最后一个）URL的缓冲区；
行是包含://的行：
- 将URL存储（覆盖）到缓冲区；
- 设置标志以清除ID缓冲区（以后）；
行以HTTP/1.1 + SPACE 开头：
- 如果错误号为404：
  - 输出ID缓冲区的内容（如果有）；
  - 清除ID缓冲区；
  - 输出URL缓冲区的内容（如果有）；
- 清除保存（最后一个）URL的缓冲区；
- 设置标志以清除ID的缓冲区（以后）；
行，所有其他行：
- 如果设置了清除ID缓冲区的标志，那么清除缓冲区；
- 将ID附加到ID缓冲区（ SPACE 分隔）；
- 清除保存（最后一个）URL的缓冲区；
- 重置标志以清除ID的缓冲区；

多个FINDSTR命令以获得所需的结果

3 个答案: