我想打印单词[如果一个模式像值="任何字符串"但不是值="#{任何字符串}" ]在目录及其子目录中的所有文件中。
dir1
file1
( content like ..... value="GOD Grace" .....
....................................value="#{blog}"......
... value="Greek" ...)
file2
( content like ..... value="Sounder rajan" .....
....................................value="#{feek}".....
....................................value="patient"....)
subdir1
file3
( content like ..... value="Guice" .....
....................................value="#{slog}"......
... value="guide" ...)
我希望像
一样 filename filewordno wordsExtract uniqno
file1 1 GOD Grace 1
file1 2 Greek 2
file2 1 Sounder rajan 3
file2 2 patient 4
file3 1 Guice 5
file3 2 guide 6
我的尝试:
no=0;
for SourceFile in *.xhtml
do
pagename=$(basename $filename .xhtml)
cat $SourceFile | gawk 'BEGIN {FS="[ \"]"}
wno=0;
/value=/ && !/value=\"#/ && !/pages/ && !/value=\"[0-9]\"/ {
for (i=1; i<NF; i++) {
if (( !/#/ && /value=/ ) && $i == "value=" && $(i+1)!="" && $(i+1)!=":" && $(i+1)!="*" ){
print SourceFile,++wno,$(i+1),++no;
}
}
}'
done >> path/Outputfilename
我的输出
filename filewordno wordsExtract uniqno
- 1 Grace 1
- 1 Greek 2
- 1 Sounder 1
- 1 patient 2
我的3个问题
我在这里学习和工作了一个星期。如果你有时间,你的帮助对我来说更有利。
感谢
答案 0 :(得分:0)
我想循环中的简单grep命令可以做你想要的,如果你可以接受没有awk解析的解决方案那么请检查下面的脚本及其输出,我使用了你用过的相同内容在你的问题中。
脚本( extract_values.sh )
#!/bin/bash
# Loop to parse all files recursively in current directory
for file in `find . -type f -name "*.xhtml" -print`
do
v_currdir=`pwd` # store the current working directory in a variable
v_file_path=`dirname $file` # extract file path seperately
v_file_name=`basename $file` # extract file name seperately
cd $v_file_path # change to that directory
# command to extract the required data
grep -o -H 'value=\".*\"' $v_file_name | grep -v 'value=\"#.*\"' | sed 's/value=//g' | grep -nv 'StringNotToBeFound'
# Again change the directory to current working directory for next itreation
cd $v_currdir
done
脚本的示例执行
$ ls
dir1 extract_values.sh
$ find . -print
.
./dir1
./dir1/file1.xhtml
./dir1/file2.xhtml
./dir1/subdir1
./dir1/subdir1/file3.xhtml
./extract_values.sh
$ # this is the command using the above script to add header
$ # And change the delimiter to tab from colon using tr command
$ (echo "UniqNO:FWordNo:FileName:WordsExtract"; extract_values.sh | nl -s: ) | tr ':' "\t"
UniqNO FWordNo FileName WordsExtract
1 1 file1.xhtml "GOD Grace"
2 2 file1.xhtml "Greek"
3 1 file2.xhtml "Sounder rajan"
4 2 file2.xhtml "patient"
5 1 file3.xhtml "Guice"
6 2 file3.xhtml "guide"
$