分组并计算Windows CMD中文本文件的行

时间:2014-02-03 17:45:33

标签: sorting cmd grouping

我有一个带标识符的长文件,例如

A
A
B
C
A
C

我想进行分组,计数和排序操作以获取文件:

A 3
C 2
B 1

如何在CMD脚本中实现它?

1 个答案:

答案 0 :(得分:2)

全局修改 - 所有代码均已修改为允许-标识符。标识符不得包含!

假设标识符不包含=$!,并且标识符不区分大小写,则以下列出按标识符排序的计数。

@echo off
setlocal enableDelayedExpansion

:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
  set /a "cnt=!$%%A!+1"
  set "$%%A=!cnt!"
)

:: Write the results to a new file
>output.txt (
  for /f "tokens=1,2 delims=$=" %%A in ('set $') do echo %%A %%B
)

:: Show the result
type output.txt

可以根据需要调整前缀。但是,如果标识符区分大小写,则无法使用此技术。

修改

这是一个按计数降序对结果进行排序的版本

@echo off
setlocal enableDelayedExpansion

:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
  set /a "cnt=!$%%A!+1"
  set "$%%A=!cnt!"
)

:: Write a temp file with zero padded counts prefixed to the left.
>temp.txt (
  for /f "tokens=1,2 delims=$=" %%A in ('set $') do (
    set "cnt=000000000000%%B"
    echo !cnt:~-12!=%%A=%%B
  )
)

:: Sort and write the results to a new file
>output.txt (
  for /f "tokens=2,3 delims=$=" %%A in ('sort /r temp.txt') do echo %%A %%B
)
del "temp.txt"

:: Show the result
type output.txt

编辑2

这是另一个按计数递减排序的选项,假设REPL.BAT位于PATH中的某个位置

@echo off
setlocal enableDelayedExpansion

:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
  set /a "cnt=!$%%A!+1"
  set "$%%A=!cnt!"
)

:: Sort result by count descending and write to output file
set $|repl "\$(.*)=(.*)" "000000000000$2=$1 $2"|repl ".*(.{12}=.*)" $1|sort /r|repl ".{13}(.*)" $1 >output.txt

:: Show the result
type output.txt