Comparing txt file to third column in a csv bash

时间:2016-04-04 18:45:37

标签: bash shell awk

I am very new to programming and decided to learn bash as we deal with some log servers that are Linux/Unix based and so scripting is a bit easier.

I have a cvs file that is laid out as follows:

PC,user,file,path - all comma separated.

I have a white list of file names that are line separated. Some include spaces.

My goal is to compare the whitelist to column 3 of the csv file and output all lines that don't match. I have tried a while read loop with an if statement but cannot seem to get it to work. I have done a few awk one liners and actually got one from a past stackoverflow post that outputted the lines that matched the whitelist but I cannot seem to figure out how to reverse to the logic to get it to work. Code is below.

awk     'BEGIN{i=0}
       FNR==NR { a[i++]=$1; next }
        { for(j=0; j<i; j++)
            if(index($0,a[j]))
                {print $0;break}
        }' $whitelist $exestartup

I would like to stick to basic bash with no add-ons and not opposed to doing a loop/if statement instead of an awk one liner.

Sample input/output:

whitelist.txt

program.exe
super program.exe
possible-program.exe

exestartup.csv

Asset1,user1,potato.exe,c:\users\user1
Asset2,user2,program.exe,c:\users\user2
Asset3,user3,possible-program.exe,c:\users\user3
Asset4,user4,super program.exe,c:\users\user4

Output

Asset1,user1,potato.exe,c:\users\user1

3 个答案:

答案 0 :(得分:5)

awk to the rescue!

awk -F, 'FNR==NR{a[$1]; next} !($3 in a)' whitelist exestartup

set the field delimiter to comma. Load all whitelist names and compare against $3 fields of the file, if not match; print.

If you post sample input and expected output you'll get more answers and perhaps better suggestions.

using your input files

$ awk -F, 'FNR==NR{a[$1]; next} !($3 in a)' whitelist.txt exestartup.csv

Asset1,user1,potato.exe,c:\users\user1

if your awk is broken and the field values are disjoint you can revert to grep

$ grep -vf whitelist.txt exestartup.csv

Asset1,user1,potato.exe,c:\users\user1

答案 1 :(得分:0)

使用join

$ join -v 1 -t, -1 3 -2 1 -o 1.1,1.2,1.3,1.4 <(sort -t, -k3,3 exestartup.csv) <(sort whitelist.txt)
Asset1,user1,potato.exe,c:\users\user1

如果输入文件已经在匹配的键上排序(它们似乎不在您的示例中),那可能只是:

$ join -v 1 -t, -1 3 -2 1 -o 1.1,1.2,1.3,1.4 exestartup.csv whitelist.txt

答案 2 :(得分:0)

此解决方案仅使用Bash 3内置:

IFS=$'\n' read -d '' -r -a whitefiles < whitelist.txt

while IFS= read -r csvline || [[ -n $csvline ]] ; do
    IFS=, read pc user file path <<< "$csvline"
    for wfile in "${whitefiles[@]}" ; do
        [[ $wfile == "$file" ]] && continue 2
    done
    printf '%s\n' "$csvline"
done < exestartup.csv

可以在Bash 4中实现更快更清洁的解决方案,因为它具有关联数组。