我正在尝试清理字符串并删除特定的单词。我有一段有效的代码,但是它不漂亮也不健壮。
输入:the_for_an_apple_this
删除单词:用于
输出:apple_this
#!/bin/bash
str="the_for_an_apple_this"
echo $str
# looping is down because after the awk gsup the next match wouldn't work
counter=0
while [ $counter -le 10 ]
do
# replace with , "_" ?? is this correct, it seems to work
str=`echo $str | awk '{gsub(/(^|_)(the|for|an)($|_)/,"_")}1'`
((counter++))
echo $str
done
# remove beginning or trailing _
str=`echo $str | awk '{gsub(/(^)_/,"")}1' | awk '{gsub(/_($)/,"")}1'`
echo $str
此处为可测试版本:http://rextester.com/BHYSP47270
我如何清理它并使它在没有易碎计数器的情况下工作?
答案 0 :(得分:3)
仅使用本机bash逻辑:
#!/bin/bash
remove_stopwords() {
local old_settings=$- # store original shell settings so we can undo set -f
local -a words=( ) # create "words" array as a local variable
local IFS=_ # set the underscore to be the only character than separates words
set -f # disable globbing to make unquoted expansion safe
for word in $1; do # split str on chars in IFS (underscores) and iterate
case $word in "the"|"for"|"an") continue;; esac # skip stopwords
words+=( "$word" ) # put words we didn't skip into our array
done
echo "${words[*]}" # join words with underscores (first IFS character) and echo
if ! [[ $old_settings = *f* ]]; then set +f; fi # undo "set -f"
}
str="the_for_an_apple_this"
remove_stopwords "$str"
您可以在https://ideone.com/hrd1vA上看到它运行
或更简洁:在子shell中运行函数体。还进行了编辑,以使用更多仅限bash的功能
remove_stopwords() ( # parentheses launch a subshell
words=( )
IFS=_
set -f # disable globbing
for word in $1; do # unquoted for word splitting
[[ $word == @(the|for|an) ]] || words+=( "$word" )
done
echo "${words[*]}"
)
答案 1 :(得分:3)
单独使用awk怎么办?
$ tail file1 file2
==> file1 <==
an_for_the
==> file2 <==
the_for_an_apple_this
$ awk 'BEGIN{RS=ORS="_"} NR==FNR{r[$1];next} ($1 in r){next} 1' file1 file2
apple_this
这将读取您的“排除”字符串(存储在file1
中),并将用下划线分隔的单词存储为数组中的索引。然后,它将使用相同的记录分隔符浏览输入字符串(存储在file2
中),并跳过上一步中创建的数组成员的记录。
可能需要对行尾进行一些微调。
答案 2 :(得分:2)
您只需使用bash即可做到这一点:
shopt -s extglob
str="the_for_an_apple_this"
for words in "the" "for" "an"; do
str=${str//$words/}
done
str=${str//+(_)/_}; str=${str#_}; str=${str%_}
如果使用此循环,可以将其删除:
shopt -s extglob
str="the_for_an_apple_this"
str=${str//@(the|for|an)/}
str=${str//+(_)/_}; str=${str#_}; str=${str%_}
在此解决方案中,我们利用源自KSH的扩展glob选项:
答案 3 :(得分:0)
有趣的是,一个perl版本:
perl -lne '
%remove = map {$_=>1} qw(the for an);
print join "_", grep {$_ and not $remove{$_}} split /_/;
' <<< "the_for_an_apple__the_this_for"
apple_this
或不区分大小写的版本
perl -lne '
%remove = map {uc,1} qw(the for an);
print join "_", grep {$_ and not $remove{+uc}} split /_/;
' <<< "tHe_For_aN_aPple__thE_This_fOr"
aPple_This