Question

我在文本文件（test.txt）中有这个字符串：

BLA BLA BLA
BLA BLA
Found 11 errors and 7 warnings

我执行此命令：

findstr /r "[0-9]+ errors" test.txt

为了获得11 errors字符串。

相反，输出是：

Found 11 errors and 7 warnings

有人可以协助吗？

Answer 1

findstr总是返回包含匹配项的每一行，它不能仅返回子字符串。因此，您需要自己进行子字符串提取。无论如何，您的findstr命令行中存在一些问题，我想指出：

findstr的字符串参数实际上定义了由空格分隔的多个搜索字符串，因此一个搜索字符串为[0-9]+，另一个为error。由于单词Found 11 errors and 7 warnings，因此返回文本文件中的行error，数字部分不属于匹配项，因为findstr不支持+字符（前一个字符或类的一次或多次出现），您需要将搜索字符串的该部分更改为[0-9][0-9]*以实现该目的。要将整个字符串视为一个搜索字符串，您需要提供/C选项;由于默认为文字搜索模式，因此您还需要明确添加/R选项。

findstr /R /C:"[0-9][0-9]* errors" "test.txt"

然而，改变这一切也会匹配像x5 errorse这样的字符串;为了避免你可以使用像\<（单词的开头）和\>（单词的结尾）这样的单词边界。（或者，您也可以在搜索字符串的两侧包含空格，因此/C:" [0-9][0-9]* errors "，但如果搜索字符串出现在适用行的开头或结尾，则可能会出现问题。）

所以关于上述所有内容，更正和改进的命令行如下所示：

findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"

这将返回包含匹配项的整行：

Found 11 errors and 7 warnings

如果您只想返回此类行并排除2 errors are enough或35 warnings but less than 3 errors等行，您当然可以相应地扩展搜索字符串：

findstr /R /C:"^Found [0-9][0-9]* errors and [0-9][0-9]* warnings$" "test.txt"

无论如何，要提取部分11 errors，有几个选项：

for /F循环可以解析findstr的输出并提取某些标记：

for /F "tokens=2-3 delims= " %%E in ('
    findstr/R /C:"\<[0-9][0-9]* errors\>" "test.txt"
') do echo(%%E %%F

也可以使用子字符串替换语法：

for /F "delims=" %%L in ('
    findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"
') do set "LINE=%%L"
set "LINE=%LINE:* =%"
set "LINE=%LINE: and =" & rem "%"
echo(%LINE%

Answer 2

findstr 工具不能仅用于提取匹配项。使用Powershell要容易得多。

以下是一个例子：

$input_path = 'c:\ps\in.txt'
$output_file = 'c:\ps\out.txt'
$regex = '[0-9]+ errors'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

有关如何使用上述脚本的信息，请参阅the Windows PowerShell: Extracting Strings Using Regular Expressions article。

Findstr - 仅返回正则表达式匹配

2 个答案: