Question

我正在处理包含网址的几个文件。我尝试过使用sed，cut和grep，但我真的不确定如何处理这个问题。如果你能让我朝着正确的方向前进，我会非常感激。

文件1：

https://example1.com
http://example2.com

文件2：

example1.com/example1-is-https-domain/
example1.com/need-https-in-front/
example1.com/match-me-to-https/
example1.com/example-https-not-http/
example2.com/im-an-http-domain/
example2.com/must-match-to-example2/
example2.com/path-of-http/
example2.com/http-domain-not-https/
example3.com/this-should-not-match/
example3.com/this-page-is-not-required/

期望的输出：

https://example1.com/example1-is-https-domain/
https://example1.com/need-https-in-front/
https://example1.com/match-me-to-https/
https://example1.com/example-https-not-http/
http://example2.com/im-an-http-domain/
http://example2.com/must-match-to-example2/
http://example2.com/path-of-http/
http://example2.com/http-domain-not-https/

我的方法：

我认为我可以使用grep选项匹配＆＃39; //＆＃39;然后需要使用另一个命令将所找到的内容粘贴在一起？在这里，我挣扎了一下。非常感谢任何帮助。

要点：

我真的试图将正确的http或https添加到文件1和2之间的匹配域中。

Answer 1

让我们看看：

awk 'BEGIN{OFS=FS="/"}NR==FNR{k[$3]=$0;next}$1 in k{$1=k[$1];print}'

我认为它可以完成这项工作，但我没有在这里测试它。

它使用第一个文件（NR == FNR）创建一个包含所选域的字典，对于第二个文件，它在创建的字典中查找域，如果存在，则将域名替换为来自的完整记录文件1，然后打印所有

Answer 2

这可能适合你（GNU sed）：

sed -r 's#.*//(.*)#s,^\1,&,p#' file1 | sed -nf - file2

从file1生成sed脚本并将其应用于file2。

Answer 3

您的问题已标记为bash，sed和awk。我看到了sed和awk的答案，所以这里有一个纯粹的bash（4+）来完成这个集合。

仅在bash中，根本没有外部工具，你可以这样做：

# Populate an associative array with the domain/method map
declare -A s=()
while IFS=/ read -a a; do
  s["${a[2]}"]="${a[0]}"
done < file1

# Step through the URL list, printing array values based on the domain
while IFS=/ read d p; do
  [[ ${s[$d]+z} ]] && printf '%s//%s/%s\n' "${s[$d]}" "$d" "$p"
done < file2

这显然不像kcoder的awk解决方案那样性感，也不像potong的sed-script-that-write-a-sed-script那样神秘，但它应该产生差不多的效果结果

如果在两个文件之间找到匹配，如何仅添加字符串

3 个答案: