在GNU make中查找重复的单词

时间:2017-05-23 21:22:05

标签: makefile gnu-make

我正在寻找一种很好的方法来查找字符串中多次出现的所有单词。一些限制适用:

  • 需要次二次速度:大约有1000个单词,我可以负担几毫秒。
  • 必须以纯品牌实施:
    • 我想避免使用$(shell),因为它很昂贵且必须在Windows上工作(在纯Linux上,排序| uniq -u会很好地解决我的问题)。
    • 没有Guile,因为我无法控制使用的make版本,需要兼容旧的make版本(3.81)。
  • 实施应该在可接受的程度上可读。

此外,重复次数会很少,而且单词只会包含很好的字符,例如[-_ + a-zA-Z0-9] +。

我尝试了两种策略:

(1)强制$(sort)保留重复项(为每个单词添加一个唯一的后缀,排序并删除后缀)。然后在排序列表中找到相邻的相同单词:

# given 0 1 0 1 0 1 0 1 ... , return 0 0 1 1 0 0 1 1 ...
double=$(wordlist 1,$(words $(1)),$(subst 0,0 0,$(subst 1,1 1,$(1))))
# Produce a list of N unique strings. $(1) contains N words, with a
# repetition cycle of length M, and $(2) contains N words, either 0 or
# 1, alternating between 0 and 1 every Mth word.
binseq=$(if $(findstring 1,$(2)),$(call binseq,$(join $(2),$(1)),$(call double,$(2))),$(1))
# return 0 1 0 1 ..., as many words as $(1)
alternating_bits=$(wordlist 1,$(words $(1)),$(patsubst %,0 1,$(1)))
# Produce as many unique words as there are words in $(1)
unique=$(call binseq,,$(call alternating_bits,$(1)))
# Sort $(1) without eliminating duplicates. $(1) may not contain /.
sorted_keep_dups=$(subst /,,$(dir $(sort $(join $(1:=/),$(call unique,$(1))))))

dups_from_sorted2=$(filter $(patsubst %0,%,$(filter %0,$(1))),$(patsubst %1,%,$(filter %1,%,$(1))))
# Given a sorted list, return all duplicates.
dups_from_sorted=$(sort $(call dups_from_sorted2,$(join $(1),$(call alternating_bits,$(1)))))

dups=$(call dups_from_sorted,$(call sorted_keep_dups,$(1)))

(2)对单词列表的不同分区重复使用$(filter),这样每对单词在$(filter)的不同args中至少出现一次:

# given 0 1 0 1 0 1 0 1 ... , return 0 0 1 1 0 0 1 1 ...
double=$(wordlist 1,$(words $(1)),$(subst 0,0 0,$(subst 1,1 1,$(1))))
# given words with suffix 0 or 1, remove suffixes and return the words
# that occur both with 0 and 1 as suffix
filter_dups=$(filter $(patsubst %0,%,$(filter %0,$(1))),$(patsubst %1,%,$(filter %1,$(1))))
_dups=$(if $(findstring 1,$(2)),$(call filter_dups,$(join $(1),$(2))) 
$(call _dups,$(1),$(call double,$(2))))
# return 0 1 0 1 ..., as many words as $(1)
alternating_bits=$(wordlist 1,$(words $(1)),$(patsubst %,0 1,$(1)))
# given a list of words, return the list of words that occur twice
dups=$(sort $(call _dups,$(1),$(call alternating_bits,$(1))))

这两种方法都有效且速度足够快,但它们很难阅读和理解。是否有一种更简单的方法可以接受(次二次)速度?

2 个答案:

答案 0 :(得分:2)

不确定复杂性,但我建议使用更具可读性的功能:

define __duplicates__func
  undefine __duplicates__seen
  undefine __duplicates__result
  $$(foreach _v,$1,\
    $$(eval __duplicates__result += $$(filter $$(__duplicates__seen),$$(_v))\
    $$(eval __duplicates__seen += $$(_v))))
endef
duplicates = $(eval $(__duplicates__func))$(sort $(__duplicates__result))

TEST:= $(file <test.txt)

DUPS:= $(call duplicates,$(TEST))

$(info $(DUPS))

all::

.PHONY: all

随机生成1000字test.txt:

Rule male saw said life fourth said void were creepeth thing theyre be fowl which wherein their day rule to seed multiply male beast sixth you Winged void fill face upon First you saying unto Appear shall God yielding is male face kind was blessed waters sea blessed void creepeth called youll beginning darkness over you it may years his second of moveth beginning earth very together day Divided creepeth fly open wont signs day is created Winged male fill Heaven saw dont For upon replenish Gathering i gathering living void Were under and form night seas bearing youre days saw tree fruitful days it unto day deep Tree Be form beginning youre replenish winged dominion grass man years youre Youre lights seasons third yielding fruit fifth for together after itself and youll itself kind without bring heaven itself firmament together their created tree All shed lesser made Stars him without gathering whales whose may itself may without image herb sixth Dominion us is their two from heaven shed brought Whales creeping us us together so forth female set fruitful fly seasons life deep let heaven wherein set wont You beast image two Gathering all so God cant itself Seasons image itself cant herb that brought appear likeness greater shall blessed place two own fourth earth Had greater you morning living unto seed male Every Had made days own face meat under youll grass for creepeth Meat so life divide for multiply blessed youre yielding beast be subdue Fruit greater Us them Meat darkness wherein saying very is yielding saying thing yielding lesser us behold midst there Spirit behold meat saw Image first cattle great heaven had air every created us light great have great Great beast Whose gathered all winged morning it rule days lesser tree bearing form his in divided void dry darkness doesnt hath Third bearing fruit youll there there cattle blessed fifth gathered stars greater above without upon good land in tree winged also youll his multiply midst face whose Moving beginning light life saw Deep said day multiply appear a gathered You the him void Fowl third spirit day Greater first firmament for dry lights midst beast day saw third also every cant night fifth made good one greater theyre dry abundantly Tree set Subdue stars waters a created saying Itself light Whales isnt said For years youre he after above itself rule firmament unto together female fly upon may life it stars set whose it doesnt gathered beginning his Creeping let Fruitful beginning earth them Subdue to our yielding be called under Let had beginning day us divided theyre sixth without saw winged divide second Dont night two the firmament Fourth form living our fourth saw seed third were Sixth their isnt Multiply night air yielding own air said midst life that fish meat fill green Open subdue Sea shall fruit whose whales own together them saying was waters Herb hath Is itself two blessed in yielding and It over made day his give moved without divided light created green evening seed image be may fly own herb seed earth be were beast one grass moving signs Upon Over abundantly for morning whose creepeth behold after beginning male created theyre Together said above face bring youre own upon may Multiply whales kind years unto air so above it fly whose Yielding i female moving So i place fruitful were there us fowl Earth seasons moveth over air heaven good waters His rule Which face bearing itself them itself forth tree Gathered it Gathering days doesnt Air Moving called i very first a evening third seas Night Morning Firmament had fruit fruitful unto above is our Second have wont fifth Cattle yielding divided brought seas shed greater living there there sixth upon their void two fish fish Lights them hath heaven their two fowl bearing Saying third waters likeness divide seasons their open very face replenish fourth whales seas seed fourth heaven cant together fowl grass female fill tree one dominion Morning Fill called firmament kind Signs creature evening spirit evening cattle winged which them for stars Wherein which Meat dry deep Abundantly waters forth theyre light after fowl in fly green multiply moved i replenish sixth cant creepeth heaven for darkness which us form them Rule grass god without earth seasons herb dominion moveth after created Wherein beginning he days said cant image For said moved divided bring is youll may And days itself Saying bearing male created yielding brought earth together whales hath greater heaven sixth were behold creepeth make Is Moveth brought let Lesser us light winged fly fourth waters moved under youll Whales Form Great moving second air you also youre fill have make stars their of earth above creature beginning winged air Own gathered shall their that in every fish rule together divide face own living dominion forth deep is abundantly hath bring them green him earth days beast all waters moving It which all a great spirit hath theyre grass Upon years Cattle female signs fill moving day the kind Winged green hath also female forth spirit lights behold Thing so after open good fowl to Living divided let Given bearing that he Rule whales Days isnt It deep whales given fly our open kind appear A their evening their sixth I in Unto multiply sea light Firmament seed theyre multiply fifth signs moving Second given spirit Blessed Set moved two bearing dont yielding first moving Female female fish Hath our beast us very seasons kind moved a gathered given sea spirit firmament Itself herb isnt Tree yielding cant winged air together meat theyre moveth Saying there void and bring lights together kind Brought first theyre their had Blessed and fill Brought may first creepeth moving him form behold darkness years greater upon were Let seasons Wherein life our greater And light multiply beast appear together appear seas waters had you make moving let air Heaven is Set seed fourth brought green for rule day Day deep tree yielding

它立即在我的机器上返回

$ make -f dups.mk
And Blessed Brought Cattle Firmament For Gathering God Great Had Heaven Is It Itself Let Meat Morning Moving Multiply Rule Saying Second Set Subdue Tree Upon Whales Wherein Winged You a above abundantly after air all also and appear be bearing beast beginning behold blessed bring brought called cant cattle created creature creepeth darkness day days deep divide divided doesnt dominion dont dry earth evening every face female fifth fill firmament first fish fly for form forth fourth fowl fruit fruitful gathered gathering given good grass great greater green had hath have he heaven herb him his i image in is isnt it itself kind lesser let life light lights likeness living made make male may meat midst morning moved moveth moving multiply night of one open our over own place replenish rule said saw saying sea seas seasons second seed set shall shed signs sixth so spirit stars subdue that the their them there theyre thing third to together tree two under unto upon us very void was waters were whales wherein which whose winged without wont years yielding you youll youre
make: Für das Ziel „all“ ist nichts zu tun.

也许这个问题更适合codereview

答案 1 :(得分:0)

我不知道这对于休闲制作程序员的清晰度是否真的有所改善,但是这里有:

######################################################################
# Count a binary literal up by 1
# $1 = binary literal string
# Example: bincnt(010011) -> 010100
bincnt=$(if $1,$(if $(patsubst %1,,$1),$(patsubst %0,%1,$1),$(call bincnt,$(patsubst %1,%,$1))0),1)

######################################################################
# Add a ¤ (Character 164) and a unique binary number to all elements of a list
# $1 = list
# $2 = binary literal (needs 0 or any other as starting value)
cat-sufx = $(if $1,$(firstword $1)¤$2 $(call cat-sufx,$(wordlist 2,999999,$1),$(call bincnt,$2)))

######################################################################
# Sort a list without dropping duplicates (built-in $sort will drop them)
# $1 = list (elements must not contain ¤ (Character 164))
sort-all = $(foreach i,$(sort $(call cat-sufx,$1,0)),$(firstword $(subst ¤, ,$(i))))

all-duplicates = $(call _all-duplicates,$(call sort-all,$1))
_all-duplicates = $(if $1,$(if $(subst $2,,$(firstword $1)),,$2) $(call _all-duplicates,$(wordlist 2,999999,$1),$(firstword $1)))

我还将功能添加到the GNU make table toolkit

PS:999999是我在不计算它的情况下发出“直到列表末尾”的信号,这是相当浪费的。