如何使用shell脚本对文本文件的内容进行排序

时间:2014-12-02 15:18:50

标签: bash shell unix

我是shell脚本新手。我感兴趣的是如何知道如何使用shell脚本对文件内容进行排序。

以下是一个例子:

fap0089-josh.baker
fap00233-adrian.edwards
fap00293-bob.boyle
fap00293-bob.jones
fap002-brian.lopez
fap00293-colby.morris
fap00293-cole.mitchell
psf0354-SKOWALSKI
psf0354-SLEE
psf0382-SLOWE
psf0391-SNOMURA
psf0354-SPATEL
psf0364-SRICHARDS
psf0354-SSEIBERT
psf0354-SSIRAH
bsi0004-STRAN
bsi0894-STURBIC
unit054-SUNDERWOOD

考虑到上面的数据(这是一个很小的集合,我有超过5.5条记录),我想这样排序:

  1. 以fap,psf,bsi,unit等开头的条目数...
  2. 每种类型的环境总数,即:单词后面的每个数字,0004,0382,054等都是环境。例如:psf有4个独特的环境。
  3. 总和

1 个答案:

答案 0 :(得分:2)

这是一个Schwarzian变换,用1)前导字母,然后2)数字

排序
sed -r 's/^([[:alpha:]]+)([[:digit:]]+)/\1 \2 /' filename | 
sort -t ' ' -k 1,1 -k 2,2n | 
sed 's/ //; s/ //'

输出:

bsi0004-STRAN
bsi0894-STURBIC
fap002-brian.lopez
fap0089-josh.baker
fap00233-adrian.edwards
fap00293-bob.boyle
fap00293-bob.jones
fap00293-colby.morris
fap00293-cole.mitchell
psf0354-SKOWALSKI
psf0354-SLEE
psf0354-SPATEL
psf0354-SSEIBERT
psf0354-SSIRAH
psf0364-SRICHARDS
psf0382-SLOWE
psf0391-SNOMURA
unit054-SUNDERWOOD

要生成您提及的指标,我会使用perl:

perl -nE '
    /^([[:alpha:]]+)(\d+)/ or next;
    $count{$1}++;
    $nenv{$1}{$2}=1;
    $total+=$2
} 
END {
    say "Counts:";
    say "$_ => $count{$_}" for sort keys %count;
    say "Number of environments";
    say "$_ => ", scalar keys %{$nenv{$_}} for sort keys %nenv;
    say "Total = $total";
' filename
Counts:
bsi => 2
fap => 7
psf => 8
unit => 1
Number of environments
bsi => 2
fap => 4
psf => 4
unit => 1
Total = 5355

不使用perl,效率较低,因为你必须多次读取文件。

echo Counts:
sed 's/[0-9].*//' filename | sort | uniq -c 
echo Number of environments:
sed -r 's/^([a-z]+)([0-9]*).*/\1 \2/' filename | sort -u | cut -d" " -f1 | uniq -c
echo Total:
{ printf "%d+" $(sed -r 's/^[a-z0]+([0-9]*).*/\1/' filename); echo 0; } | bc
Counts:
      2 bsi
      7 fap
      8 psf
      1 unit
Number of environments:
      2 bsi
      4 fap
      4 psf
      1 unit
Total:
5355