字频率和-gt

时间:2017-04-18 20:15:12

标签: linux bash shell awk sed

我的代码检查文件和显示中所有单词的频率,但我想知道如何只显示长度大于变量k的单词。 这是我的代码:

#!/bin/bash
if [ $# -eq 0 ]; then

    echo "you need an argument"
    exit 2
fi

echo "Insert k"
read k
for file in $@; do
    if ! [ -f $file ]; then
    echo "Not a file"
    exit 2
    fi
    sed -e 's/\s/\n/g' < $file | sort | uniq -c | sort -nr
done

文件内容:

ceva
ceva
aiurea
sebi
este
cel
mai
smecher

输出:

     2 ceva
     1 smecher
     1 sebi
     1 mai
     1 este
     1 cel
     1 aiurea

2 个答案:

答案 0 :(得分:3)

使用awk计算字长大于变量的频率:

awk -v k=3 'length() > k { freq[$0]++} END{for (i in freq) print freq[i], i}' file |
sort -rn

2 ceva
1 smecher
1 sebi
1 este
1 aiurea

完整脚本:

#!/usr/bin/env bash
if [[ $# -eq 0 ]]; then
    echo "you need an argument"
    exit 2
fi

read -p "Insert k: " k

for file in "$@"; do
    if [[ ! -f $file ]]; then
       echo "$file is not a file"
       exit 2
    fi

    echo "$file:"
    awk -v k=$k 'length()>k{freq[$0]++} END{for (i in freq) print freq[i], i}' "$file" | sort -rn
done

答案 1 :(得分:1)

你也可以这样做。

#!/bin/bash

while read -r line; do
    arr+=("$line")
done< <(tr ' ' '\n' < $file | sort | uniq -c | awk '{print $2" "$1}')

for a in "${arr[@]}"; do
    count=$(echo $a|awk '{print $2}')
    if (( count > 2 )); then
        echo $a
    fi
done