我正在尝试编写一个Linux bash脚本,它将帮助我从文本文件生成一些我需要的统计信息。假设我正在使用的文本文件中有以下格式:
"string : pathname1 : pathname2 : pathname3 : … pathnameN"
路径名"我"是我找到特定字符串的文件的完整路径。例如,这样的文件可能如下所示:
logile.txt
string : "version" pathname1: /home/Desktop/myfile.txt pathname2 : /usr/lib/tmp/sample.txt
string : "user" pathname1 : temp1/tmpfiles/user.txt pathname2 : newfile.txt pathname3 : /Downloads/myfiles/old/credentials.txt
string : "admin" pathname1 :
string: "build" pathname1 : Documents/projects/myproject/readme.txt pathname2
: Desktop/readmetoo.txt
在这个例子中,我希望我的bash脚本通知我,我搜索了总共4个单词(版本,用户,管理员,构建),并且在大多数文件中找到的单词是" user",在3个文件中找到。正在使用" awk"指挥一个好方法?我不熟悉bash脚本,所以任何帮助都会有用!谢谢!
答案 0 :(得分:0)
这不是一件容易的事,它可能只在awk中完成,但你问的是bash脚本。以下bash脚本:
#!/bin/bash
set -euo pipefail
wordcount=0
tmp=""
# for each line in input file read 3 fields
while read string name rest; do
# in each line the first word should be equal to string
if [ "$string" != string ]; then
# if it isn't continue to the next line
continue;
fi
# remove '"' from name
name=${name//\"}
# the rest has a format of pathname <path>
# substitute every space with newline in the rest
# and remove lines containing pathname'i'
# to get only paths in the rest
rest=$(echo "$rest" | tr ' ' '\n' | grep -v "pathname" ||:)
# count the lines in rest, each path should be in different line
restcnt=$(wc -l <<<"$rest")
# save restcnt and name in single line in tmp string to parse later
tmp+="$restcnt $name"$'\n'
# increment wordcount
wordcount=$[$wordcount+1]
# feed while read loop with a command substitution
done < <(
# cat stdin or file given as script argument
# and substitude all doublepoints ':' with spaces
cat "$@" | sed 's/:/ /g'
)
# sort the tmp string from the lowest to biggest
# and get last line (ie. biggest number
tmp=$(echo "$tmp" | sort -n | tail -n1)
# split tmp into two variables - the word and the count
read mostcnt mostfile <<<"$tmp"
# and print user information
echo "You searched for a total of $wordcount words."
echo "The word that was found in most files was: $mostfile"
echo " which was found in $mostcnt files."
...使用以下logile.txt
...
string : "version" pathname1:/home/Desktop/myfile.txt pathname2 : /usr/lib/tmp/sample.txt
string : "user" pathname1: temp1/tmpfiles/user.txt pathname2: newfile.txt pathname3: /Downloads/myfiles/old/credentials.txt
string : "admin" pathname1 :
string: "build" pathname1: Documents/projects/myproject/readme.txt pathname2:Desktop/readmetoo.txt
...产生以下结果:
$ /tmp/1.sh <./logile.txt
You searched for a total of 4 words.
The word that was found in most files was: user
which was found in 6 files.