使用sed在文本文件中使用来自另一个文本文件的字符串进行查找和替换

时间:2016-02-04 03:08:47

标签: bash sed

我有两个文件如下。第一个是sample.txt

new haven co-op toronto on $1245
joe schmo co-op powell river bc $4444

第二个是locations.txt

toronto
powell river
on
bc

我们希望使用sed生成标记为sample-new.txt的标记;,并在每个标记之前和之后添加new haven co-op ;toronto; ;on; $1245 joe schmo co-op ;powell river; ;bc; $4444 。这样最终的字符串就会显示为:

cat locations.txt | xargs -i sed 's/{}/;/' sample.txt

这可能使用bash吗?实际文件更长(每种情况下有数千行)但作为一次性工作,我们并不太关心处理时间。

---编辑添加---

我原来的方法是这样的:

<? $valid_formats = array("jpg", "png", "gif", "zip", "bmp"); $max_file_size = 1024*1024; //1MB $path = "uploads/doc/"; // Upload directory $count = 0; if(isset($_POST) and $_SERVER['REQUEST_METHOD'] == "POST"){ // Loop $_FILES to exeicute all files foreach ($_FILES['files']['name'] as $f => $name) { if ($_FILES['files']['error'][$f] == 4) { continue; // Skip file if any error found } if ($_FILES['files']['error'][$f] == 0) { if ($_FILES['files']['size'][$f] > $max_file_size) { $message[] = "$name is too large!."; continue; // Skip large files } elseif( ! in_array(pathinfo($name, PATHINFO_EXTENSION), $valid_formats) ){ $message[] = "$name is not a valid format"; continue; // Skip invalid file formats } else{ // No error found! Move uploaded files if(move_uploaded_file($_FILES["files"]["tmp_name"][$f], $path.$name)) $count++; // Number of successfully uploaded file } } } } ?> <html lang="en"> <head> <meta charset="UTF-8" /> <title>Multiple File Ppload with PHP</title> </head> <body> <form action="" method="post" enctype="multipart/form-data"> Doc1: <input type="file" id="file" name="files['doc1']" multiple="multiple" accept="image/*" /><br> Doc2: <input type="file" id="file" name="files['doc2']" multiple="multiple" accept="image/*" /><br> <input type="submit" value="Upload!" /> </form> </body> </html>

但它只针对每个模式运行一次脚本,而不是你在这里提出的方法。

2 个答案:

答案 0 :(得分:2)

使用awk

awk 'NR==FNR{a[NR]=$0; next;} {for(i in a)gsub("\\<"a[i]"\\>",";"a[i]";"); print} '  locations.txt sample.txt

使用awk+sed

sed -f <(awk '{print "s|\\<"$0"\\>|;"$0";|g"}' locations.txt) sample.txt

使用纯sed

sed -f <(sed 's/.*/s|\\<&\\>|\;&\;|g/' locations.txt) sample.txt

(在您展示编码尝试后,我将添加解释其原因的原因。)

答案 1 :(得分:1)

只是为了完成你的选项,你可以用纯粹的bash,慢慢地做到这一点:

#!/usr/bin/env bash

readarray -t places < t2

while read line; do
  for place in "${places[@]}"; do
      line="${line/ $place / ;$place; }"
  done
  echo "$line"
done < t1

请注意,如果您包含位于其他地方的地方,例如“湖上的尼亚加拉”,则可能无法按预期工作:

foo bar co-op ;niagara ;on; the lake; on $1

相反,您可能希望进行更有针对性的模式匹配,这在awk中会更容易:

#!/usr/bin/awk -f

# Collect the location list into the index of an array
NR==FNR {
  places[$0]
  next
}

# Now step through the input file
{

  # Handle two-letter provinces
  if ($(NF-1) in places) {
      $(NF-1)=";" $(NF-1) ";"
  }

  # Step through the remaining places doing substitutions as we find matches
  for (place in places) {
    if (length(place)>2 && index($0,place)) {
      sub(place,";"place";")
    }
  }

}

# Print every line
1

这适用于我使用您问题中的数据:

$ cat places
toronto
powell river
niagara on the lake
on
bc
$ ./tst places input
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
foo nar co-op ;niagara on the lake; ;on; $1

如果您的位置文件包含包含两个字母的实际非省份,则可能会出现问题。我不确定加拿大是否存在这样的事情,但如果他们这样做,你或者需要手动调整这些行,或者通过单独处理城市的省份来使脚本更复杂。