我知道我问得太多了,但也许你也可以帮助解决这个问题。
a.txt包含单词,b.txt包含字符串。
我想知道b.txt中有多少字符串以a.txt
中的单词结尾实施例: A.TXT
apple
peach
potato
b.txt
greenapple
bigapple
rottenapple
pinkpeach
xxlpotatoxxx
输出
3 apple greenapple bigapple rottenapple
1 peach pinkpeach
我想有一个grep的解决方案,因为它比awk更快。
你能帮我吗?答案 0 :(得分:3)
这是awk
解决方案
awk 'FNR==NR{a[$1]++;next} {for (i in a) {if ($0~i"$") {b[i]++;w[i]=w[i]?w[i] FS $0:$0}}} END {for (j in b) print b[j],j,w[j]}' a.txt b.txt
3 apple greenapple bigapple rottenapple
1 peach pinkpeach
使用grep
它是如何工作的(它不是那么复杂)?
awk '
FNR==NR{ # Run this part for first file (a.txt) only
a[$1]++ # Store it in an array a
next} # Skip to next record
{ # Run this part for file b.txt
for (i in a) { # Loop trough all data in array a
if ($0~i"$") { # Does b.txt have some from array a at the end of it?
b[i]++ # Yes , count it
w[i]=w[i]?w[i] FS $0:$0 # and store the record it found it in in array w
}
}
}
END { # When both file has been read do the END part
for (j in b) # Loop trough all element in array b and
print b[j],j,w[j]} # Print array b, index and array w
' a.txt b.txt # Read the two files
答案 1 :(得分:1)
此解决方案仅依赖于bash
和grep
。恕我直言,它比awk
唯一的方法更容易理解:
#!/bin/bash
# Set input parameters (usually a good idea than hardcoding them)
WORDFILE=a.txt
SEARCHFILE=b.txt
# Read 'a.txt' word by word (i.e. line by line)
while read word; do
# Get numbers of hits
num=`grep "$word\$" $SEARCHFILE | wc -l`
# If no line matches in 'b.txt', skip this word
if [ $num -eq 0 ]; then
continue
fi
# Print number of hits and search word
printf "%d $word" $num
# Print all lines that match from file 'b.txt'
for found in `grep "$word\$" $SEARCHFILE`; do
printf " $found"
done
# Print newline
printf "\n"
done < $WORDFILE
修改强>
如果要将结果存储在文件中,可以通常的方式重定向上述脚本的输出,例如
./find_matching_ends.sh > matching_ends.txt
如果您要使用该字词搜索开始的行,则需要将grep
模式从"$word\$"
更改为“^ $ word”。如果您希望此搜索同时搜索匹配结束,则需要在脚本内部移动重定向,例如。
...
printf "%d $word" $num > matching_ends.txt
...
当您搜索匹配的结尾时,
...
printf "%d $word" $num > matching_starts.txt
...
当您正在寻找以搜索词开头的行时。
答案 2 :(得分:0)
我想提出一个基于Bash
的解决方案来避免grep
。相反,它使用for
- 循环和数组:
#!/usr/bin/env bash
# Set mode: start | end
mode="end"
# Read contents of input files into arrays - line by line
IFS=$'\n' read -d -r -a patterns < "$1"
IFS=$'\n' read -d -r -a targets < "$2"
# Bash 4 can use readarray
#readarray -t patterns < "$1"
#readarray -t targets < "$2"
# Alternatively use cat to get the contents into arrays (slower)
#patterns=($(cat $1))
#targets=($(cat $2))
# Iterate over both arrays to compare the strings with each other
for pattern in "${patterns[@]}"; do
# Setup a variable that counts the hits for each pattern
hits_counter=0
# Setup a variable that takes the matched strings for each pattern
hits_match=""
# Setup a regex pattern according to the user defined mode
if [[ "$mode" == "start" ]]; then
regex="^${pattern}"
elif [[ "$mode" == "end" ]]; then
regex="${pattern}$"
fi
for target in "${targets[@]}"; do
# Use regex pattern matching
if [[ "$target" =~ $regex ]]; then
# If we detect a match increase the counter by 1
(( hits_counter++ ))
# If we detect a match write it to our hits_match variable and append a space
hits_match+="${target} "
fi
done
# Print a result for each pattern if we have at least one match
if (( hits_counter > 0 )); then
printf "%i %s %s\n" "$hits_counter" "$pattern" "$hits_match"
fi
done
这给出了以下结果:
./filter a.txt b.txt
3 apple greenapple bigapple rottenapple
1 peach pinkpeach