用于查找,搜索和替换文件中字符串数组的Shell脚本

时间:2010-07-10 07:11:56

标签: bash unix shell sed grep

这与我在Code golf: "Color highlighting" of repeated text

上提出的另一个问题/代码高尔夫相关联

我的文件'sample1.txt'包含以下内容:

LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.

我有一个脚本生成文件中出现的以下字符串数组(仅显示了一些用于说明):

LoremIpsum
LoremIpsu
dummytext
oremIpsum
LoremIps
dummytex
industry
oremIpsu
remIpsum
ummytext
LoremIp
dummyte
emIpsum
industr
mmytext

我需要(从顶部)查看文件sample1.txt中是否出现“LoremIpsum”。如果是这样,我想用<T1>LoremIpsum</T1>替换所有出现的LoremIpsum。现在,当程序移动到下一个单词'LoremIpsu'时,它不应与sample1.txt中的<T1>LoremIpsum</T1>文本匹配。它应该对这个'数组'的所有元素重复上述内容。下一个'有效'将是'dummytext',应标记为<T2>dummytext</T2>

我认为应该可以为此创建一个bash shell脚本解决方案,而不是依赖于perl / python / ruby​​程序。

2 个答案:

答案 0 :(得分:1)

Pure Bash(无外部)

在Bash命令行:

$ sample="LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook."
$ # or: sample=$(<sample1.txt)
$ array=(
LoremIpsum
LoremIpsu
dummytext
...
)
$ tag=0; for entry in ${array[@]}; do test="<[^>/]*>[^>]*$entry[^<]*</"; if [[ ! $sample =~ $test ]]; then ((tag++)); sample=${sample//${entry}/<T$tag>$entry</T$tag>}; fi; done; echo "Output:"; echo $sample
Output:
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>industry</T3>.<T1>LoremIpsum</T1>hasbeenthe<T3>industry</T3>'sstandard<T2>dummytext</T2>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.

答案 1 :(得分:0)

与Perl直截了当:

#! /usr/bin/perl

use warnings;
use strict;

my @words = qw/
  LoremIpsum
  LoremIpsu
  dummytext
  oremIpsum
  LoremIps
  dummytex
  industry
  oremIpsu
  remIpsum
  ummytext
  LoremIp
  dummyte
  emIpsum
  industr
  mmytext
/;

my $to_replace = qr/@{[ join "|" =>
                        sort { length $b <=> length $a }
                        @words
                     ]}/;

my $i = 0;
while (<>) {
  s|($to_replace)|++$i; "<T$i>$1</T$i>"|eg;
  print;
}

示例运行(包装以防止水平滚动):

$ ./tag-words sample.txt
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>indus
try</T3>.<T4>LoremIpsum</T4>hasbeenthe<T5>industry</T5>'sstandard<T6>dummytext</T
6>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatyp
especimenbook.

您可能会反对所有qr//@{[ ... ]}业务都在巴洛克方面。使用/o正则表达式开关可以获得与

相同的效果
# plain scalar rather than a compiled pattern
my $to_replace = join "|" =>
                 sort { length $b <=> length $a }
                 @words;

my $i = 0;
while (<>) {
  # o at the end for "compile (o)nce"
  s|($to_replace)|++$i; "<T$i>$1</T$i>"|ego;
  print;
}