我有几句话:
one
two
two
three
我有一个文件,其中每个单词重复n次。例如,在n = 2时,给定文件为:
one
two
two
three
two
three
two
one
问题是如何恢复原始单词集(我知道$n
数字)。
请注意,单词“two”应该出现两次,因此sort -u file.txt
或sort file.txt | uniq
不是答案!
答案 0 :(得分:4)
此行为您提供未排序原始行:
awk -v n="2" '{a[$0]++}END{for(x in a)for(i=1;i<=a[x]/n;i++)print x}' file
n
可能是变量,我使用了硬编码的2
。使用当前输入文件,输出:
two
two
three
one
输出未排序,因为只有您的输入文件无法知道“原始”文件的顺序。
#still n=2
kent$ cat f
one
one
one
one
three
three
two
two
two
two
two
two
kent$ awk -v n="2" '{a[$0]++}END{for(x in a)for(i=1;i<=a[x]/n;i++)print x}' f
three
two
two
two
one
one
#now n=4:
kent$ cat f
one
one
one
one
one
one
one
one
three
three
three
three
two
two
two
two
two
two
two
two
two
two
two
two
kent$ awk -v n="4" '{a[$0]++}END{for(x in a)for(i=1;i<=a[x]/n;i++)print x}' f
three
two
two
two
one
one
答案 1 :(得分:1)
另一个:
n=2
inp="./in"
while read -r cnt word
do
seq -f "$word" $(( cnt / n ))
done < <(sort "$inp" | uniq -c)
打印
one
three
two
two
perl变种
perl -nE '$s{$_}++}{print "$_"x($s{$_}/2) for keys %s' < in
最后,纯 bash(4 +)
file="./in"
div=2
declare -A w
while read -r word
do
[[ -z "${w[$word]}" ]] && order+=($word)
let w[$word]++
done < "$file"
for word in "${order[@]}"
do
cnt=$(( ${w[$word]} / div ))
for(( i=0; i < $cnt ; i++ ))
do
echo $word
done
done
按照第一个在输入中找到单词的顺序打印,例如:
one
two
two
three