我正在尝试制作一个简单的脚本,使用bash在文本文件中查找最大的单词及其数量/长度。我知道当我使用awk它简单直接但我想尝试使用这种方法...我想我知道是否a=wmememememe
如果我想找到我可以使用echo {#a}
的长度我想echo ${a}
。但我想将其应用于以下
for i in `cat so.txt` do
如果so.txt包含单词,我希望它有意义。
答案 0 :(得分:21)
打一个班轮。
cat YOUR_FILENAME | sed 's/ /\n/g' | sort | uniq | awk '{print length, $0}' | sort -nr | head
是的,这将比上面的一些解决方案慢,但它也不需要记住bash for循环的语义。
答案 1 :(得分:12)
通常情况下,您需要使用while read
循环而不是for i in $(cat)
,但是因为您希望拆分所有单词,所以在这种情况下它可以正常运行。
#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"
答案 2 :(得分:5)
longest=""
for word in $(cat so.txt); do
if [ ${#word} -gt ${#longest} ]; then
longest=$word
fi
done
echo $longest
答案 3 :(得分:4)
另一种解决方案:
for item in $(cat "$infile"); do
length[${#item}]=$item # use word length as index
done
maxword=${length[@]: -1} # select last array element
printf "longest word '%s', length %d" ${maxword} ${#maxword}
答案 4 :(得分:3)
awk
脚本:#!/usr/bin/awk -f
# Initialize two variables
BEGIN {
maxlength=0;
maxword=0
}
# Loop through each word on the line
{
for(i=1;i<=NF;i++)
# Assign the maxlength variable if length of word found is greater. Also, assign
# the word to maxword variable.
if (length($i)>maxlength)
{
maxlength=length($i);
maxword=$i;
}
}
# Print out the maxword and the maxlength
END {
print maxword,maxlength;
}
[jaypal:~/Temp] cat textfile
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language
consisting of a set of actions to be taken against textual data (either in files or data streams)
for the purpose of producing formatted reports.
The language used by awk extensively uses the string datatype,
associative arrays (that is, arrays indexed by key strings), and regular expressions.
[jaypal:~/Temp] ./script.awk textfile
data_extraction 15
答案 5 :(得分:1)
for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1
答案 6 :(得分:0)
'jimis' xargs
-based answer的修改后的 POSIX shell版本;仍然很慢,需要两到三分钟:
tr "'" '_' < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' |
sort -n | tail | tr '_' "'"
请注意开头和结尾的tr
位可以用单引号解决 GNU xargs
的困难。
答案 7 :(得分:-1)
由于数量众多的叉子而变慢,但纯壳,不需要awk或特殊的bash功能:
$ cat /usr/share/dict/words | \
xargs -n1 -i sh -c 'echo `echo -n {} | wc -c` {}' | sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize
您可以轻松并行化,例如通过向xargs提供-P4
来获得4个CPU。