BASH +如何验证数组中的单词是否包含在变量中

时间:2018-01-04 13:54:52

标签: linux bash perl awk sed

你好朋友和大学

我编写了以下脚本,以验证数组中的单词是否包含在$ list变量

#!/bin/bash

list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"
array=( sdb sdd sde sdf sdg  )


function contain_word
{

contain=false

[[ -z "${list// }" ]] && return

for arr in ${array[*]}
do
 echo "$list" | grep -q $arr
 [[ $? -eq 0 ]] &&  (( count ++ ))
done

[[ ${#array[@]} -eq $count ]] && export contain=true

}

contain_word
echo $contain

这个脚本完成了这项工作,但它的长代码用于此目的并且很丑陋

我很高兴能够更好地了解如何做得更好(在bash / awk / perl one liner等)

示例1

对于

  list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"

  array=( sdb sdd sde sdf sdg  )

它将打印 true

示例2

对于

 list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"

 array=( sdw sdd sde sdf sdg  )

它将打印 false

6 个答案:

答案 0 :(得分:-1)

这里有几行python可以做到。

[user@local ~/tmp/b] python

>>> list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"
>>> array="sdb sdd sde sdf sdg"
>>> set(array.split(" ")).issubset(set(list.split(" ")))
True

>>> list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"
>>> array="sdw sdd sde sdf sdg"
>>> set(array.split(" ")).issubset(set(list.split(" ")))
False

答案 1 :(得分:-1)

您可以将标量变量转换为数组变量并比较两个数组(不是单个线性,可能有很多方法)。

my $list = "sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo";
my @listary = split / /, $list;

my @myarray = qw(sdb sdd sde sdf sdg);
  

请比较arrays

答案 2 :(得分:-1)

我的解决方案基于awk,它将list转换为正则表达式,以便从数组中删除列表中的每个单词。如果结果为空,则打印true;否则打印false

awk -v list="$list" '
{
  gsub(" +","|",list)
  gsub(" *("list") *","")
  print ($0) ? "false" : "true"
}
' <<<"${array[*]}"

答案 3 :(得分:-1)

$ cat tst.sh
contain_word() {
    list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"

    printf '%s\n' "${array[@]}" |
    awk -v list="$list" '
        BEGIN {
            split(list,tmpArr)
            for (idx in tmpArr) {
                wordSet[tmpArr[idx]]
            }
        }
        !($0 in wordSet) {
            exit 1
        }
    '
}

array=( sdb sdd sde sdf sdg  )
contain_word
printf '%s -> %s\n' "${array[*]}" "$?"

array=( sdw sdd sde sdf sdg )
contain_word
printf '%s -> %s\n' "${array[*]}" "$?"

$ ./tst.sh
sdb sdd sde sdf sdg -> 0
sdw sdd sde sdf sdg -> 1

以上使用完整的字符串比较,因此不可能有部分匹配或错误的正则表达式匹配。它也不会因为globbing而失败,并且可以在任何UNIX框中使用任何shell中的任何awk(使用您提供的语法支持数组)。您当然可以调整awk代码或调用shell代码来打印true或false,而不仅仅是awk退出状态。

答案 4 :(得分:-1)

Edit2 :使用Perl很好,简单且非常高效。

perl -e'@h{split/ /,shift}=();exists$h{$_}||exit 1 for@ARGV' "$list" "${array[@]}" && echo "true" || echo "false"

<强>原始

如果数组在文件中但是无论如何都会更简单

list="sdb sdc sdd sde sdf sdg sdh sdi sdk sdj sdo"
array=( sdb sdd sde sdf sdg )
[[ $(echo $list | sed 's/ /\n/g' | sort -u | grep -Ff <(echo ${array[@]} | sed 's/ /\n/g') | wc -l) -eq ${#array[@]} ]] && echo "true" || echo "false"

或更短

[[ $(sed 's/ /\n/g' <<<$list | sort -u | grep -Ff <(sed 's/ /\n/g' <<<${array[@]}) | wc -l) -eq ${#array[@]} ]] && echo "true" || echo "false"

为什么有人在数组周围循环? O(N * M)和O(N + M)之间存在差异。

编辑:似乎了解计算机的工作方式以及O符号的含义并不像我预期的那样常见,只有一个小小的演示。

#!/bin/bash

list="aaa aab aac aad aae aaf aag aah aai aaj aak aal aam aan aao aap aaq aar aas aat aau aav aaw aax aay aaz aba abb abc abd abe abf abg abh abi abj abk abl abm abn abo abp abq abr abs abt abu abv abw abx aby abz aca acb acc acd ace acf acg ach aci acj ack acl acm acn aco acp acq acr acs act acu acv acw acx acy acz ada adb adc add ade adf adg adh adi adj adk adl adm adn ado adp adq adr ads adt adu adv adw adx ady adz aea aeb aec aed aee aef aeg aeh aei aej aek ael aem aen aeo aep aeq aer aes aet aeu aev aew aex aey aez afa afb afc afd afe aff afg afh afi afj afk afl afm afn afo afp afq afr afs aft afu afv afw afx afy afz aga agb agc agd age agf agg agh agi agj agk agl agm agn ago agp agq agr ags agt agu agv agw agx agy agz aha ahb ahc ahd ahe ahf ahg ahh ahi ahj ahk ahl ahm ahn aho ahp ahq ahr ahs aht ahu ahv ahw ahx ahy ahz aia aib aic aid aie aif aig aih aii aij aik ail aim ain aio aip aiq air ais ait aiu aiv aiw aix aiy aiz aja ajb ajc ajd aje ajf ajg ajh aji ajj ajk ajl ajm ajn ajo ajp ajq ajr ajs ajt aju ajv ajw ajx ajy ajz aka akb akc akd ake akf akg akh aki akj akk akl akm akn ako akp akq akr aks akt aku akv akw akx aky akz ala alb alc ald ale alf alg alh ali alj alk all alm aln alo alp alq alr als alt alu alv alw alx aly alz ama amb amc amd ame amf amg amh ami amj amk aml amm amn amo amp amq amr ams amt amu amv amw amx amy amz ana anb anc and ane anf ang anh ani anj ank anl anm ann ano anp anq anr ans ant anu anv anw anx any anz aoa aob aoc aod aoe aof aog aoh aoi aoj aok aol aom aon aoo aop aoq aor aos aot aou aov aow aox aoy aoz apa apb apc apd ape apf apg aph api apj apk apl apm apn apo app apq apr aps apt apu apv apw apx apy apz aqa aqb aqc aqd aqe aqf aqg aqh aqi aqj aqk aql aqm aqn aqo aqp aqq aqr aqs aqt aqu aqv aqw aqx aqy aqz ara arb arc ard are arf arg arh ari arj ark arl arm arn aro arp arq arr ars art aru arv arw arx ary arz asa asb asc asd ase asf asg ash asi asj ask asl asm asn aso asp asq asr ass ast asu asv asw asx asy asz ata atb atc atd ate atf atg ath ati atj atk atl atm atn ato atp atq atr ats att atu atv atw atx aty atz aua aub auc aud aue auf aug auh aui auj auk aul aum aun auo aup auq aur aus aut auu auv auw aux auy auz ava avb avc avd ave avf avg avh avi avj avk avl avm avn avo avp avq avr avs avt avu avv avw avx avy avz awa awb awc awd awe awf awg awh awi awj awk awl awm awn awo awp awq awr aws awt awu awv aww awx awy awz axa axb axc axd axe axf axg axh axi axj axk axl axm axn axo axp axq axr axs axt axu axv axw axx axy axz aya ayb ayc ayd aye ayf ayg ayh ayi ayj ayk ayl aym ayn ayo ayp ayq ayr ays ayt ayu ayv ayw ayx ayy ayz aza azb azc azd aze azf azg azh azi azj azk azl azm azn azo azp azq azr azs azt azu azv azw azx azy azz baa bab bac bad bae baf bag bah bai baj bak bal bam ban bao bap baq bar bas bat bau bav baw bax bay baz bba bbb bbc bbd bbe bbf bbg bbh bbi bbj bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn bco bcp bcq bcr bcs bct bcu bcv bcw bcx bcy bcz bda bdb bdc bdd bde bdf bdg bdh bdi bdj bdk bdl bdm bdn bdo bdp bdq bdr bds bdt bdu bdv bdw bdx bdy bdz bea beb bec bed bee bef beg beh bei bej bek bel bem ben beo bep beq ber bes bet beu bev bew bex bey bez bfa bfb bfc bfd bfe bff bfg bfh bfi bfj bfk bfl bfm bfn bfo bfp bfq bfr bfs bft bfu bfv bfw bfx bfy bfz bga bgb bgc bgd bge bgf bgg bgh bgi bgj bgk bgl bgm bgn bgo bgp bgq bgr bgs bgt bgu bgv bgw bgx bgy bgz bha bhb bhc bhd bhe bhf bhg bhh bhi bhj bhk bhl bhm bhn bho bhp bhq bhr bhs bht bhu bhv bhw bhx bhy bhz bia bib bic bid bie bif big bih bii bij bik bil bim bin bio bip biq bir bis bit biu biv biw bix biy biz bja bjb bjc bjd bje bjf bjg bjh bji bjj bjk bjl bjm bjn bjo bjp bjq bjr bjs bjt bju bjv bjw bjx bjy bjz bka bkb bkc bkd bke bkf bkg bkh bki bkj bkk bkl bkm bkn bko bkp bkq bkr bks bkt bku bkv bkw bkx bky bkz bla blb blc bld ble blf blg blh bli blj blk bll blm bln blo blp blq blr bls blt blu blv blw blx bly blz bma bmb bmc bmd bme bmf bmg bmh bmi bmj bmk bml"
array=(
aaa aab aac aad aae aaf aag aah aai aaj aak aal aam aan aao aap aaq aar aas
aat aau aav aaw aax aay aaz aba abb abc abd abe abf abg abh abi abj abk abl
abm abn abo abp abq abr abs abt abu abv abw abx aby abz aca acb acc acd ace
acf acg ach aci acj ack acl acm acn aco acp acq acr acs act acu acv acw acx
acy acz ada adb adc add ade adf adg adh adi adj adk adl adm adn ado adp adq
adr ads adt adu adv adw adx ady adz aea aeb aec aed aee aef aeg aeh aei aej
aek ael aem aen aeo aep aeq aer aes aet aeu aev aew aex aey aez afa afb afc
afd afe aff afg afh afi afj afk afl afm afn afo afp afq afr afs aft afu afv
afw afx afy afz aga agb agc agd age agf agg agh agi agj agk agl agm agn ago
agp agq agr ags agt agu agv agw agx agy agz aha ahb ahc ahd ahe ahf ahg ahh
ahi ahj ahk ahl ahm ahn aho ahp ahq ahr ahs aht ahu ahv ahw ahx ahy ahz aia
aib aic aid aie aif aig aih aii aij aik ail aim ain aio aip aiq air ais ait
aiu aiv aiw aix aiy aiz aja ajb ajc ajd aje ajf ajg ajh aji ajj ajk ajl ajm
ajn ajo ajp ajq ajr ajs ajt aju ajv ajw ajx ajy ajz aka akb akc akd ake akf
akg akh aki akj akk akl akm akn ako akp akq akr aks akt aku akv akw akx aky
akz ala alb alc ald ale alf alg alh ali alj alk all alm aln alo alp alq alr
als alt alu alv alw alx aly alz ama amb amc amd ame amf amg amh ami amj amk
aml amm amn amo amp amq amr ams amt amu amv amw amx amy amz ana anb anc and
ane anf ang anh ani anj ank anl anm ann ano anp anq anr ans ant anu anv anw
anx any anz aoa aob aoc aod aoe aof aog aoh aoi aoj aok aol aom aon aoo aop
aoq aor aos aot aou aov aow aox aoy aoz apa apb apc apd ape apf apg aph api
apj apk apl apm apn apo app apq apr aps apt apu apv apw apx apy apz aqa aqb
aqc aqd aqe aqf aqg aqh aqi aqj aqk aql aqm aqn aqo aqp aqq aqr aqs aqt aqu
aqv aqw aqx aqy aqz ara arb arc ard are arf arg arh ari arj ark arl arm arn
aro arp arq arr ars art aru arv arw arx ary arz asa asb asc asd ase asf asg
ash asi asj ask asl asm asn aso asp asq asr ass ast asu asv asw asx asy asz
ata atb atc atd ate atf atg ath ati atj atk atl atm atn ato atp atq atr ats
att atu atv atw atx aty atz aua aub auc aud aue auf aug auh aui auj auk aul
aum aun auo aup auq aur aus aut auu auv auw aux auy auz ava avb avc avd ave
avf avg avh avi avj avk avl avm avn avo avp avq avr avs avt avu avv avw avx
avy avz awa awb awc awd awe awf awg awh awi awj awk awl awm awn awo awp awq
awr aws awt awu awv aww awx awy awz axa axb axc axd axe axf axg axh axi axj
axk axl axm axn axo axp axq axr axs axt axu axv axw axx axy axz aya ayb ayc
ayd aye ayf ayg ayh ayi ayj ayk ayl aym ayn ayo ayp ayq ayr ays ayt ayu ayv
ayw ayx ayy ayz aza azb azc azd aze azf azg azh azi azj azk azl azm azn azo
azp azq azr azs azt azu azv azw azx azy azz baa bab bac bad bae baf bag bah
bai baj bak bal bam ban bao bap baq bar bas bat bau bav baw bax bay baz bba
bbb bbc bbd bbe bbf bbg bbh bbi bbj bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt
bbu bbv bbw bbx bby bbz bca bcb bcc bcd bce bcf bcg bch bci bcj bck bcl bcm
bcn bco bcp bcq bcr bcs bct bcu bcv bcw bcx bcy bcz bda bdb bdc bdd bde bdf
bdg bdh bdi bdj bdk bdl bdm bdn bdo bdp bdq bdr bds bdt bdu bdv bdw bdx bdy
bdz bea beb bec bed bee bef beg beh bei bej bek bel bem ben beo bep beq ber
bes bet beu bev bew bex bey bez bfa bfb bfc bfd bfe bff bfg bfh bfi bfj bfk
bfl bfm bfn bfo bfp bfq bfr bfs bft bfu bfv bfw bfx bfy bfz bga bgb bgc bgd
bge bgf bgg bgh bgi bgj bgk bgl bgm bgn bgo bgp bgq bgr bgs bgt bgu bgv bgw
bgx bgy bgz bha bhb bhc bhd bhe bhf bhg bhh bhi bhj bhk bhl bhm bhn bho bhp
bhq bhr bhs bht bhu bhv bhw bhx bhy bhz bia bib bic bid bie bif big bih bii
bij bik bil bim bin bio bip biq bir bis bit biu biv biw bix biy biz bja bjb
bjc bjd bje bjf bjg bjh bji bjj bjk bjl bjm bjn bjo bjp bjq bjr bjs bjt bju
bjv bjw bjx bjy bjz bka bkb bkc bkd bke bkf bkg bkh bki bkj bkk bkl bkm bkn
bko bkp bkq bkr bks bkt bku bkv bkw bkx bky bkz bla blb blc bld ble blf blg
blh bli blj blk bll blm bln blo blp blq blr bls blt blu blv blw blx bly blz
bma bmb bmc bmd bme bmf bmg bmh bmi bmj bmk bml
);

function contain_word
{

[[ -z "${list// }" ]] && return 1

for arr in ${array[*]}
do
 echo "$list" | grep -q $arr
 [[ $? -eq 0 ]] &&  (( count ++ ))
done

[[ ${#array[@]} -eq $count ]]

}

function contain_word2
{
    [[ $(sed 's/ /\n/g' <<<$list | sort -u | grep -Ff <(sed 's/ /\n/g' <<<${array[@]}) | wc -l) -eq ${#array[@]} ]]
}

contain_word$1 && echo "true" || echo "false"

简单演示O(M * N)对O(M + N)对于M = N = 1000意味着什么对于现代HW来说并不是太多,不是吗?

$ time ./test.sh
true

real    0m0.989s
user    0m1.040s
sys     0m0.319s
$ time ./test.sh 2
true

real    0m0.011s
user    0m0.012s
sys     0m0.000s

即使M = N = 100

list="aaa aab aac aad aae aaf aag aah aai aaj aak aal aam aan aao aap aaq aar aas aat aau aav aaw aax aay aaz aba abb abc abd abe abf abg abh abi abj abk abl abm abn abo abp abq abr abs abt abu abv abw abx aby abz aca acb acc acd ace acf acg ach aci acj ack acl acm acn aco acp acq acr acs act acu acv acw acx acy acz ada adb adc add ade adf adg adh adi adj adk adl adm adn ado adp adq adr ads adt adu adv"
array=(
aaa aab aac aad aae aaf aag aah aai aaj aak aal aam aan aao aap aaq aar aas
aat aau aav aaw aax aay aaz aba abb abc abd abe abf abg abh abi abj abk abl
abm abn abo abp abq abr abs abt abu abv abw abx aby abz aca acb acc acd ace
acf acg ach aci acj ack acl acm acn aco acp acq acr acs act acu acv acw acx
acy acz ada adb adc add ade adf adg adh adi adj adk adl adm adn ado adp adq
adr ads adt adu adv
)
$ time ./test.sh
true

real    0m0.117s
user    0m0.105s
sys     0m0.042s
$ time ./test.sh 2
true

real    0m0.008s
user    0m0.008s
sys     0m0.001s

它是多么低效。

使用Perl的BTW比AWK更优雅

function contain_word3
{
    perl -e'@h{split/ /,shift}=();exists$h{$_}||exit 1 for@ARGV' "$list" "${array[@]}"
}

和快速(8ms)。

答案 5 :(得分:-2)

perl one-liner

perl -ape 'BEGIN{$H{$_}=1while$_=shift}$_=!grep!$H{$_},@F' $list <<<"${array[*]}"

打印

1

perl -ape 'BEGIN{$H{$_}=1while$_=shift}$_=!grep!$H{$_},@F' $list <<<"${array[*]} not"

不打印任何内容,因为not不在$list

如何工作,perl -h用于命令行开关,

  • BEGIN{$H{$_}=1while$_=shift}:使用键@ARGV和值1以及清空@ARGV列表
  • 填充哈希值
  • $_=!grep!$H{$_},@F:grep返回散列中未找到的元素数组,因为标量上下文$=!返回元素数,!返回1 {{1} },如果=0,则无。

否则也可以使用关联数组在bash&gt; = 4.0中完成:

>0

遵循与perl相同的逻辑,但可以简化反相逻辑

declare -A hashlist=([sdg]="1" [sdf]="1" [sde]="1" [sdd]="1" [sdc]="1" [sdb]="1" [sdo]="1" [sdk]="1" [sdj]="1" [sdi]="1" [sdh]="1" )

array=( sdb sdd sde sdf sdg  )

r=0; for a in "${array[@]}"; do ((r|=\!hashlist[$a])); done ;((r=\!r))

也可以进行优化,以便在找不到第一个项目时打破循环

r=1; for a in "${array[@]}"; do ((r&=\!hashlist[$a])); done

如果找到所有条目则r = 1,否则为0。

请注意,由于r=1; for a in "${array[@]}"; do ((r&=hashlist[$a])) || break; done 切换,!必须仅在命令行中转义\!,但在脚本中必须删除-H