sed中的正则表达式匹配路径中的子路径与捕获组

时间:2016-11-30 01:43:58

标签: bash sed path

我有一个dictionaries列表,由两个名为index且扩展名为{aff,dic}的文件制作,如

dictionaries/dictionaries/bg_BG/index.dic
dictionaries/dictionaries/ca_ES/index.dic
dictionaries/dictionaries/cs_CZ/index.dic
dictionaries/dictionaries/da_DK/index.dic
...
dictionaries/dictionaries/bg_BG/index.aff
dictionaries/dictionaries/ca_ES/index.aff
dictionaries/dictionaries/cs_CZ/index.aff
dictionaries/dictionaries/da_DK/index.aff

我希望将它们复制到另一个文件夹中,但要将每个子目录命名为it_IT,以便

myDicts/it_IT.dic
myDicts/it_IT.acc

我出来了这个内联

for file in dictionaries/dictionaries/**/*.{dic,aff}; do echo ${file}; done

列出这些文件夹中的文件,$file for...loop变量dictionaries/dictionaries/da_DK/index.aff。{/ p>

所以使用sed我能够选择(排除)那些模式,如

sed 's:[a-z][a-z][_-][A-Z][A-Z]::';

所以

for file in dictionaries/dictionaries/**/*.{dic,aff}; do echo ${file} | sed 's:[a-z][a-z][_-][A-Z][A-Z]::'; done

这次会打印出来

dictionaries/dictionaries//index.dic
dictionaries/dictionaries//index.dic
dictionaries/dictionaries//index.dic
...
dictionaries/dictionaries//index.aff
dictionaries/dictionaries//index.aff
dictionaries/dictionaries//index.aff

据我所知,我知道打印出捕获组的sed需要指定捕获的组和非捕获组 - 请参阅here

但我无法弄清楚如何实现这一目标,以便最后进入$file

bg_BG.acc
ca_ES.acc
da_DK.acc
...
bg_BG.dic
ca_ES.dic
da_DK.dic

还应添加扩展名{acc,dic}。 出于脚本原因,我需要内联执行此命令。

[UPDATE] 感谢下面的回答,我提出了这个解决方案

for file in dictionaries/dictionaries/**/*.{dic,aff}; do echo $file | sed 's:.*\([a-z][a-z][_-][A-Z][A-Z]\)/index\(.*\):cp & myDicts/\1\2:' | sh; done

完成它的工作:

$ ls myDicts/
bg_BG.aff cs_CZ.aff de_AT.aff de_DE.aff en_AU.aff en_GB.aff en_ZA.aff eu_ES.aff gl_ES.aff it_IT.aff mn_MN.aff nl_NL.aff pl_PL.aff pt_PT.aff ru_RU.aff sl_SI.aff sv_SE.aff uk_UA.aff
bg_BG.dic cs_CZ.dic de_AT.dic de_DE.dic en_AU.dic en_GB.dic en_ZA.dic eu_ES.dic gl_ES.dic it_IT.dic mn_MN.dic nl_NL.dic pl_PL.dic pt_PT.dic ru_RU.dic sl_SI.dic sv_SE.dic uk_UA.dic
ca_ES.aff da_DK.aff de_CH.aff el_GR.aff en_CA.aff en_US.aff es_ES.aff fr_FR.aff hr_HR.aff lb_LU.aff nb_NO.aff nn_NO.aff pt_BR.aff ro_RO.aff sk_SK.aff sr_RS.aff tr-TR.aff vi_VN.aff
ca_ES.dic da_DK.dic de_CH.dic el_GR.dic en_CA.dic en_US.dic es_ES.dic fr_FR.dic hr_HR.dic lb_LU.dic nb_NO.dic nn_NO.dic pt_BR.dic ro_RO.dic sk_SK.dic sr_RS.dic tr-TR.dic vi_VN.dic

只有一个陷阱是它没有捕获这些路径模式

dictionaries/dictionaries/ca_ES-valencia/
dictionaries/dictionaries/sr_RS-Latn
dictionaries/dictionaries/ca_ES-valencia/
dictionaries/dictionaries/sr_RS-Latn/

1 个答案:

答案 0 :(得分:1)

这是一种方式:

echo dictionaries/dictionaries/da_DK/index.aff |
  sed 's:.*\([^/]\+\)/index\(\..*\):\1\2:'

输出:

da_DK.aff

然而,有一种比for循环更快的方法:

find dictionaries/dictionaries -name "index.dic" -or -name "index.aff" |
  sed 's:dictionaries/dictionaries/\([^/]\+\)/index\(\..*\):mv & myDicts/\1\2:'

如果它产生您想要的命令,请将其传递给sh

mkdir myDicts
find dictionaries/dictionaries -name "index.dic" -or -name "index.aff" |
  sed 's:dictionaries/dictionaries/\([^/]\+\)/index\(\..*\):mv & myDicts/\1\2:' |
  sh