我在一个文件夹中有大约3000个文件。我的文件包含以下数据:
VISITERM_0 VISITERM_20 VISITERM_35 .....等等
每个文件都没有像上面那样的相同值。它们从0到99不等。
我想知道文件夹中有多少个文件包含每个VISITERMS。例如,如果文件夹中的300个文件中存在VISITERM_0,那么我需要它来打印
VISITERM_0 300
类似如果有1000个文件包含VISITERM_1,我需要它来打印 VISITERM_1 1000
所以,我想打印VISITERMs以及从VISITERM_0开始直到VISITERM_99的文件数。
我使用了
的grep命令 grep VISITERM_0 * -l | wc -l
但是,这是一个单项,我想将它从VISITERM_0循环到VISITERM_99。请帮忙!
答案 0 :(得分:1)
#!/bin/bash
# ^^- the above is important; #!/bin/sh would allow only POSIX syntax
# use a C-style for loop, which is a bash extension
for ((i=0; i<100; i++)); do
# Calculate number of matches...
num_matches=$(find . -type f -exec grep -l -e "VISITERM_$i" '{}' + | wc -l)
# ...and print the result.
printf 'VISITERM_%d\t%d\n' "$i" "$num_matches"
done
答案 1 :(得分:1)
以下是gnu awk
(由于RS中有多个字符而引起的gnu):
awk -v RS=" |\n" '{n=split($1,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' *
示例:
cat file1
VISITERM_0 VISITERM_320 VISITERM_35
cat file2
VISITERM_0 VISITERM_20 VISITERM_32
VISITERM_20 VISITERM_42 VISITERM_11
给出:
awk -v RS=" |\n" '{n=split($1,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' file*
VISITERM_0 2
VISITERM_11 1
VISITERM_20 2
VISITERM_32 1
VISITERM_35 1
VISITERM_42 1
工作原理:
awk -v RS=" |\n" ' # Set record selector to space or new line
{n=split($1,a,"VISITERM_") # Split record using "VISITERM_" as separator and store hits of split in "n"
if (n==2 && a[2]<100) # If "n" is "2" (does contain "ISITERM_") and has number less "100"
b[a[2]]++} # Count the hit of each number and stor it in array "b"
END {for (i in b) # Walk trough array "b"
print "VISITERM_"i,b[i]} # Print the hits
' file* # Read the files
PS
如果一切只在一行上,请更改为RS=" "
。然后它应该适用于大多数awk