使用正则表达式的Makefile模式规则

时间:2019-02-28 01:27:31

标签: regex makefile

我需要转换一堆名称如下的文件:

bar_S4_R1_001.fastq.gz

bar_S4_R2_001.fastq.gz

放入名称如下的文件:

bar_R1_001.fastq.gz

bar_R2_001.fastq.gz

我想使用Makefile pattern ruledocsexamples

我有一个这样的Makefile:

SHELL:=/bin/bash
files:
    touch foo_S13_R2_001.fastq.gz
    touch foo_S13_R1_001.fastq.gz
    touch bar_S4_R2_001.fastq.gz
    touch bar_S4_R1_001.fastq.gz
    touch baz_S9_R2_001.fastq.gz
    touch baz_S9_R1_001.fastq.gz

FILES:=$(shell find . -name "*.fastq.gz")
.PHONY: $(FILES)

demo: $(FILES)
$(FILES):
    @printf "source: $@ , target should be: " ; \
    echo "$@" | sed -e 's|\(_S[0-9]*\)\(_R[12]_001.fastq.gz\)$$|\2|'

示例:

$ make files
touch foo_S13_R2_001.fastq.gz
touch foo_S13_R1_001.fastq.gz
touch bar_S4_R2_001.fastq.gz
touch bar_S4_R1_001.fastq.gz
touch baz_S9_R2_001.fastq.gz
touch baz_S9_R1_001.fastq.gz

$ make demo
source: bar_S4_R1_001.fastq.gz , target should be: bar_R1_001.fastq.gz
source: foo_S13_R2_001.fastq.gz , target should be: foo_R2_001.fastq.gz
source: bar_S4_R2_001.fastq.gz , target should be: bar_R2_001.fastq.gz
source: baz_S9_R2_001.fastq.gz , target should be: baz_R2_001.fastq.gz
source: baz_S9_R1_001.fastq.gz , target should be: baz_R1_001.fastq.gz
source: foo_S13_R1_001.fastq.gz , target should be: foo_R1_001.fastq.gz

很显然,我在这里的“演示”配方中有“源”和“目标”。我如何为此使用Makefile模式规则来制作正确的文件?

2 个答案:

答案 0 :(得分:1)

花哨的但可与GNU兼容,这要归功于(著名的)foreach-eval-call组合和string substitution and analysis functions

SHELL         := /bin/bash
.DEFAULT_GOAL := all
FILES         := foo_S13_R2_001.fastq.gz foo_S13_R1_001.fastq.gz bar_S4_R2_001.fastq.gz bar_S4_R1_001.fastq.gz baz_S9_R2_001.fastq.gz baz_S9_R1_001.fastq.gz
TARGETS       :=

.PHONY: all

# $(1): name of the source file in X_Y_Z_T.fastq.gz form
# tmp: X Y Z T.fastq.gz
# tmp1: X
# tmp2: Y
# tmp3: Z
# tmp4: T.fastq.gz
define MYRULE
tmp  := $$(subst _, ,$(1))
tmp1 := $$(word 1,$$(tmp))
tmp2 := $$(word 2,$$(tmp))
tmp3 := $$(word 3,$$(tmp))
tmp4 := $$(word 4,$$(tmp))

$$(tmp1)_$$(tmp3)_$$(tmp4): $(1)
    @printf 'source: %s, target should be: %s\n' "$$<" "$$@"

TARGETS += $$(tmp1)_$$(tmp3)_$$(tmp4)
endef
$(foreach f,$(FILES),$(eval $(call MYRULE,$(f))))

all: $(TARGETS)

$(FILES):
    @touch $@

演示:

$ make
source: foo_S13_R2_001.fastq.gz, target should be: foo_R2_001.fastq.gz
source: foo_S13_R1_001.fastq.gz, target should be: foo_R1_001.fastq.gz
source: bar_S4_R2_001.fastq.gz, target should be: bar_R2_001.fastq.gz
source: bar_S4_R1_001.fastq.gz, target should be: bar_R1_001.fastq.gz
source: baz_S9_R2_001.fastq.gz, target should be: baz_R2_001.fastq.gz
source: baz_S9_R1_001.fastq.gz, target should be: baz_R1_001.fastq.gz

注意:$$定义中的双MYRULE是必不可少的,就像:=变量赋值(而不是=递归赋值)一样。

答案 1 :(得分:1)

第一个解决方案是香草制造:

PAT_START := _S
PAT_MID := 0 1 2 3 4 5 6 7 8 9
PAT_END := 0_ 1_ 2_ 3_ 4_ 5_ 6_ 7_ 8_ 9_

SUBSTITUTE := _

PAT12 := $(foreach c,$(PAT_START),$(addprefix $(c),$(PAT_END)))
PAT123 := $(foreach c,$(foreach c,$(PAT_START),$(addprefix $(c),$(PAT_MID))),$(addprefix $(c),$(PAT_END)))

$(info $(PAT12))
$(info $(PAT123))

FILES := foo_S13_R2_001.fastq.gz foo_S13_R1_001.fastq.gz bar_S4_R2_001.fastq.gz bar_S4_R1_001.fastq.gz baz_S9_R2_001.fastq.gz baz_S9_R1_001.fastq.gz

NEW_FILES := $(strip $(foreach f,$(FILES),$(foreach p,$(PAT12) $(PAT123),$(if $(subst $(subst $(p),,$(f)),,$(f)),$(subst $(p),$(SUBSTITUTE),$(f))))))

$(info $(FILES))
$(info $(NEW_FILES))

第二种方法是使用gmtt,这是一个GNUmake库,可以大大简化此类任务:

include gmtt-master/gmtt-master/gmtt.mk

NEW_FILES := $(foreach f,$(FILES),$(call implode,$(call pick,1 4 5 6 7,$(call glob-match,$(f),*_S*_R[0-9]_*))))

$(info $(FILES))
$(info $(NEW_FILES))

请注意,gmtt解决方案使用的是 glob 而不是RE。