Question

我有两个文件，一个有大约100个根域，第二个文件只有URL。现在我必须过滤该URL列表以获取第三个文件，该文件仅包含具有列表中域的URL。

网址列表示例：

| URL                           |
| ------------------------------|
| http://github.com/name        |
| http://stackoverflow.com/name2|
| http://stackoverflow.com/name3|
| http://www.linkedin.com/name3 |

单词列表示例：

github.com
youtube.com
facebook.com

Resut：

| http://github.com/name        |

我的目标是过滤掉包含特定字词的整行。这就是我试过的：

for i in $(cat domains.csv); 
 do grep "$i" urls.csv >> filtered.csv ; 
done

结果很奇怪，我有一些链接，但并非所有链接都包含第一个文件中的根域。然后我尝试用python做同样的事情，看到bash没有做我想做的事情，我用python脚本得到了更好的结果，但是编写python脚本比运行bash命令需要更多的时间。

我是如何通过bash进一步实现这一目标的呢？

Answer 1

使用grep：

grep -F -f domains.csv url.csv

测试结果：

$ cat wordlist 
github.com
youtube.com
facebook.com

$ cat urllist 
| URL                           |
| ------------------------------|
| http://github.com/name        |
| http://stackoverflow.com/name2|
| http://stackoverflow.com/name3|
| http://www.linkedin.com/name3 |

$ grep -F -f wordlist urllist 
| http://github.com/name        |

在bash中的foreach循环

1 个答案: