循环中的GREP命令

时间:2015-03-02 23:30:00

标签: linux bash grep

我在一个文件夹中有大约3000个文件。我的文件包含以下数据:

VISITERM_0 VISITERM_20 VISITERM_35 .....等等

每个文件都没有像上面那样的相同值。它们从0到99不等。

我想知道文件夹中有多少个文件包含每个VISITERMS。例如,如果文件夹中的300个文件中存在VISITERM_0,那么我需要它来打印

VISITERM_0  300

类似如果有1000个文件包含VISITERM_1,我需要它来打印     VISITERM_1 1000

所以,我想打印VISITERMs以及从VISITERM_0开始直到VISITERM_99的文件数。

我使用了

的grep命令
 grep VISITERM_0 * -l | wc -l

但是,这是一个单项,我想将它从VISITERM_0循环到VISITERM_99。请帮忙!

2 个答案:

答案 0 :(得分:1)

#!/bin/bash
# ^^- the above is important; #!/bin/sh would allow only POSIX syntax

# use a C-style for loop, which is a bash extension
for ((i=0; i<100; i++)); do
  # Calculate number of matches...
  num_matches=$(find . -type f -exec grep -l -e "VISITERM_$i" '{}' + | wc -l)
  # ...and print the result.
  printf 'VISITERM_%d\t%d\n' "$i" "$num_matches"
done

答案 1 :(得分:1)

以下是gnu awk(由于RS中有多个字符而引起的gnu):

awk -v RS=" |\n" '{n=split($1,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' *

示例:

cat file1
VISITERM_0 VISITERM_320 VISITERM_35

cat file2
VISITERM_0 VISITERM_20 VISITERM_32
VISITERM_20 VISITERM_42 VISITERM_11

给出:

awk -v RS=" |\n" '{n=split($1,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' file*
VISITERM_0 2
VISITERM_11 1
VISITERM_20 2
VISITERM_32 1
VISITERM_35 1
VISITERM_42 1

工作原理:

awk -v RS=" |\n" '              # Set record selector to space or new line
    {n=split($1,a,"VISITERM_")  # Split record using "VISITERM_" as separator and store hits of split in "n"
    if (n==2 && a[2]<100)       # If "n" is "2" (does contain "ISITERM_") and has number less "100"
        b[a[2]]++}              # Count the hit of each number and stor it in array "b"
END {for (i in b)               # Walk trough array "b"
    print "VISITERM_"i,b[i]}    # Print the hits
' file*                         # Read the files

PS
如果一切只在一行上,请更改为RS=" "。然后它应该适用于大多数awk