Bash辅助使用文件递归地grepping目录

时间:2016-10-07 21:15:42

标签: bash shell grep

我有一个名为 hitlist.txt 的txt文件(通常会更新),其中包含一个单词/字符串列表,我想grep一个目录,反对...喜欢:

# This is just a comment and will not be part of the search
* Blah - this is a category  
foo
bar
sibilance

# A new category
* Meh - another category
snakefish
sex panther

我的列表通常是> 100个字符串,每个字符串都在它自己的行上。今天,由于截止日期,我只是通过列表并为每个单词执行以下命令:

    find -iname "*" -type f -print0 | xargs -0 -HniI "foo" >> results.txt

如上面的命令所示,我对文件路径和名称以及包含匹配文本的行感兴趣。文件中有多个类别列表(由*表示),我希望能够针对一个,多个或所有类别运行我的脚本。

我还希望能够关闭-i标志(区分大小写)作为选项。我有一个脚本递归查找/列出目录中的所有文件,以及我上面使用的命令。最后,如果需要,可以完全更改 hitlist 格式。

3 个答案:

答案 0 :(得分:1)

设置ghl() g 代表 h l ist )shell函数来完成工作,(取决于 GNU grep的{​​{1}}开关,再加上一个-o循环),输出是一个来自 hitlist.txt (或sed)的单词列表:

<filename>

# usage ghl <glob> <filename> ghl() { grep -o '\* '"$1"' -' "$2" | grep -o '[[:alpha:]]*' | \ while read x ; do \ sed -n '/\* '"$1"'/{:show ;n;/^[^ ]/{p;b show;}}' "$2" ; \ done ; } 的单词列表输出与“ghl”通配符(与 Blah 类别匹配)加载到.*ah,加上一些< em> ad hoc grep -f - process substitution生成输入文本:

bash

输出:

ghl '.*ah' hitlist.txt | grep -i -f - <(echo bar) <(echo foo) <(echo Foo)

上面的第二个/dev/fd/63:bar /dev/fd/62:foo /dev/fd/61:Foo 可以根据需要传递开关(参见grep)。示例,同样的事情,但区分大小写(即删除man grep开关):

-i

输出,(注意缺少大写项目):

ghl '.*ah' hitlist.txt | grep -f - <(echo bar) <(echo foo) <(echo Foo)

由于/dev/fd/63:bar /dev/fd/62:foo 已经有了处理递归搜索的选项,其余的只是根据需要添加开关。

答案 1 :(得分:0)

你的问题非常模糊,但我想象这或多或少都是你想要的。

awk -v cat='Blah|Meh' 'NR==FNR && /^#/ { next } # Skip comments
    NR==FNR && /^\*/ { if ($0~cat) c=1; else c=0; next }
    NR==FNR { if(c) a[$0]=1; next }
    lower($0) in a { print FILENAME ":" FNR ":" $0 }' Hits.txt files to search

弄清楚如何有选择地禁用lower()并绑定它以从Hits.txt读取find以外的文件名列表应该是相当明显的。

答案 2 :(得分:0)

这就是我最终的结果:

命中列表格式:

# MEH
never,going,to give,you up

# blah
word to,your,mother

脚本:

# Set defaults
OUTPUT_FILE="hits.txt"
HITLIST_FILE="hitlist.txt"

# Hold on to the args
ARGLIST=($*)

# Declare any functiions
help ()
{
    echo "--------------------------------- Luffa --------------------------------" 
        echo "Usage: luffa.sh [DIRTOSCRUB]"
        echo ""
        echo "Searches DIRTOSCRUB for category specific words in $HITLIST_FILE."
        echo ""
    echo "EXAMPLE: luffa.sh dirtoscrub"
    echo ""
    echo "--help                     display this help and exit"
    echo "--version                  display version information and exit"
}

version ()
{
    echo "luffa.sh v1.0" 
}

process () 
{
    if [ ${#FILEARG} -lt 1 ] # check for proper number of args
    then
        echo "ERROR: Specify directory to be searched."
        help
        exit 1
    else
            SEARCH_DIR=${ARGLIST[0]}
        fi

    echo ""
        echo "--------------------------------------------------------- Luffa ---------------------------------------------------" | tee -a "$OUTPUT_FILE"
        echo "search command: find [DIRTOSCRUB] -type f -print0 | xargs -0 grep -HniI --color=always $word | tee -a ../hits.txt | more" | tee -a "$OUTPUT_FILE"
        echo
        echo "                                                     .,,:::::." | tee -a "$OUTPUT_FILE"                   
        echo "                                                  .,,::::~:::::.." | tee -a "$OUTPUT_FILE"               
        echo "                                                ,,::::~~~~~~::~~:::." | tee -a "$OUTPUT_FILE"            
        echo "                                              ,:,:~:~~~~~~~~~~~~~~::." | tee -a "$OUTPUT_FILE"           
        echo "                                            ,,:::~:~~~~~~~~~~~~~~~~~~," | tee -a "$OUTPUT_FILE"          
        echo "                                        .,,::::~~~~~~~~~~~~~~~~~~~~~~::" | tee -a "$OUTPUT_FILE"         
        echo "                                      .,::~:~~~~~=~~~~=~~~~~~~~~~~=~~~~." | tee -a "$OUTPUT_FILE"        
        echo "                                    ,::::~~:~~~=~~~~~~~~=~~=~~~===~~~~~~." | tee -a "$OUTPUT_FILE"       
        echo "                                ..:::~~~~=~~=~~~~~~=~~~~=~~===~~==~~~~~~," | tee -a "$OUTPUT_FILE"       
        echo "                              .,:::~~~~~~~~~~~~~~~~=~=~~~=~====~===~~~~~~~." | tee -a "$OUTPUT_FILE"     
        echo "                            .,::~~~~~~~~~~~~~~=~=~~~~~=~======~=~~~~=~=~~~:" | tee -a "$OUTPUT_FILE"    
        echo "                         ..,::~:~~~~~~=~~~=~~~~~~~~=~====+======~===~~~~~~~." | tee -a "$OUTPUT_FILE"    
        echo "                       ..,:,:~~~~~~=~::~~=~=~~~=~~=~=~=~======~~~==~~~~~~::." | tee -a "$OUTPUT_FILE"    
        echo "                        ,,.::~:=~~~~~~~~~~~~=~=~===~~~====+==~=====~~~~~::,." | tee -a "$OUTPUT_FILE"    
        echo "                        ,,,,:I++=:~==~=~~~~~~=~:==~=~+~====~=~===~~~~:~::,:" | tee -a "$OUTPUT_FILE"     
        echo "                       .,:+++?77+?=~~~~=~~=~=~~=~~+=~+~~+====~=~~~:::::,::," | tee -a "$OUTPUT_FILE"     
        echo "                      ..++++?++?II?=~~=~~~=~~~====~===~=====~~~:~::::::::,." | tee -a "$OUTPUT_FILE"     
        echo "                    ..=++?++++++???7+~~~~~~~~+~=~=====~~~~~~~~~::::~:::,,.." | tee -a "$OUTPUT_FILE"     
        echo "                   .=+++++++++++++++===:~~=~==+~~=~=~~:~~=~:~:::~::::,,.." | tee -a "$OUTPUT_FILE"       
        echo "                  .++++++?++++++?++=?~:~~~~===~==~==~~~~~:::::::::,,,..." | tee -a "$OUTPUT_FILE"        
        echo "               ..=?+++++??+++++++===~::~~~~~~=~~~~~~:~~:::::,:,,,,,." | tee -a "$OUTPUT_FILE"            
        echo "            ...=+?+++++++++=====~:,::,~:::~~~~~:~~~~::::~::::,,,,.." | tee -a "$OUTPUT_FILE"             
        echo "          .=+++++++++++===~==::::,::~~,,,::~~~~~~::::::~:,:,,.." | tee -a "$OUTPUT_FILE"                
        echo "        ..++++++++++=+===~,.,,:::,:~~~~~,.,:~:~::::::,::,:,.." | tee -a "$OUTPUT_FILE"                   
        echo "    ...++?++++++++=+=~~.   ..,,,,,:,~,::~,:::,:,:,~::::,,.." | tee -a "$OUTPUT_FILE"                    
        echo "   .++++++++?++====~.       ...,,:,~::~=::,::,:,:::,,,,.." | tee -a "$OUTPUT_FILE"                        
        echo ".++?+++++?++++==~..           .,.:,,:::~,:,,,:::::,,,." | tee -a "$OUTPUT_FILE"                         
        echo "++++++?+???==~=.               ...,::~~~:,,:,:::,,." | tee -a "$OUTPUT_FILE"                           
        echo "?+++?????+==~.                   ..,,,,::,:,,,,,." | tee -a "$OUTPUT_FILE"                              
        echo "+?+++??+==~.                       ..,,,,,,,,." | tee -a "$OUTPUT_FILE"                                 
        echo "+I???+==~.                           ..,,.." | tee -a "$OUTPUT_FILE"                                     
        echo "??++==~." | tee -a "$OUTPUT_FILE"                                                                       
        echo "+===~." | tee -a "$OUTPUT_FILE"                                                                         
        echo "+=~." | tee -a "$OUTPUT_FILE"                                                                           
        echo "~" | tee -a "$OUTPUT_FILE"                             
        echo "--------------------------------------------------------------------------------------------------------------------------" | tee -a "$OUTPUT_FILE"

        echo "" | tee -a "$OUTPUT_FILE"

        # Loop through hitlist

        while read -re hitList || [[ -n "$hitList" ]]
        do

        # If first character is "#" it's a comment, or line is blank, skip
           if [ "$(echo $hitListWords | head -c 1)" != "#" ]; then

                if [ ! -z "$hitListWords" -a "$hitListWords" != "" ]; then

                   # Parse comma delimited category specific hitlist
                   IFS=',' read -ra categoryWords <<< "$hitListWords"

                   # Search for occurences/hits for the hitList word
                   for categoryWord in "${categoryWords[@]}"; do
                       echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
                       echo "$category - \"$categoryWord"\" | tee -a "$OUTPUT_FILE"
                       echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
                       eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI "$categoryWord" >> "$OUTPUT_FILE"'
                       eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI --color=always "$categoryWord" | more'
                       echo "" | tee -a "$OUTPUT_FILE"
                   done

               fi

           else

               category="$(echo "$hitListWords" | cut -d "#" -f 2)"

       fi

        done < "$HITLIST_FILE"

        exit $?
}

# Process the options
while [[ "${ARGLIST[0]}" == -* ]]; do
    OPTION="${ARGLIST[0]}"
    NUM_OPTS=1;

    case $OPTION in
    --version)
        version
        exit 0
        ;;
    --help)
        help
        exit 0
        ;;
    *)
        help
        exit 1  
        ;;
    esac

        ARGLIST=(${ARGLIST[@]:$NUM_OPTS})

done

FILEARG=${ARGLIST[@]}
process