我的CSV包含多个列和行[File1.csv]。
我有另一个CSV文件(只有一列),列出了特定的单词[File2.csv]。
如果任何列与File2中列出的任何字匹配,我希望能够删除File1中的行。
我最初用过这个:
grep -v -F -f File2.csv File1.csv > File3.csv
这在一定程度上起作用。我遇到的这个问题是包含多个单词的列(例如word1,word2,word3)。 File2包含word2但没有删除该行。
我厌倦了将这些单词分开看起来像这样:(word1,word2,word3),但原始命令不起作用。
如何从File2中删除包含单词的行,并且可能包含其他单词?
答案 0 :(得分:1)
使用awk
的一种方式。
script.awk
的内容:
BEGIN {
## Split line with a doble quote surrounded with spaces.
FS = "[ ]*\"[ ]*"
}
## File with words, save them in a hash.
FNR == NR {
words[ $2 ] = 1;
next;
}
## File with multiple columns.
FNR < NR {
## Omit line if eigth field has no interesting value or is first line of
## the file (header).
if ( $8 == "N/A" || FNR == 1 ) {
print $0
next
}
## Split interested field with commas. Traverse it searching for a
## word saved from first file. Print line only if not found.
## Change due to an error pointed out in comments.
##--> split( $8, array, /[ ]*,[ ]*/ )
##--> for ( i = 1; i <= length( array ); i++ ) {
len = split( $8, array, /[ ]*,[ ]*/ )
for ( i = 1; i <= len; i++ ) {
## END change.
if ( array[ i ] in words ) {
found = 1
break
}
}
if ( ! found ) {
print $0
}
found = 0
}
假设File1.csv
和File2.csv
在 Thor的答案的评论中提供了内容(我建议将该信息添加到问题中),请运行以下脚本:
awk -f script.awk File2.csv File1.csv
使用以下输出:
"DNSName","IP","OS","CVE","Name","Risk"
"ex.example.com","1.2.3.4","Linux","N/A","HTTP 1.1 Protocol Detected","Information"
"ex.example.com","1.2.3.4","Linux","CVE-2011-3048","LibPNG Memory Corruption Vulnerability (20120329) - RHEL5","High"
"ex.example.com","1.2.3.4","Linux","CVE-2012-2141","Net-SNMP Denial of Service (Zero-Day) - RHEL5","Medium"
"ex.example.com","1.2.3.4","Linux","N/A","Web Application index.php?s=-badrow Detected","High"
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","Apache HTTPD Server Version Out Of Date","High"
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","PHP Unsupported Version Detected","High"
"ex.example.com","1.2.3.4","Linux","N/A","HBSS Common Management Agent - UNIX/Linux","High"
答案 1 :(得分:0)
您可以在File2.csv
中转换包含多个模式的拆分行。
下面使用tr
将包含word1,word2
的行转换为单独的行,然后再将它们用作模式。 <()
构造暂时充当文件/ fifo(在bash
中测试):
grep -v -F -f <(tr ',' '\n' < File2.csv) File1.csv > File3.csv