如何使用csv中的列值来确定使用bash在同一csv中的另一列的值?

时间:2017-03-28 20:46:01

标签: bash csv awk sed grep

我有一个包含我想要更新的数据库信息的大型csv。

我想使用电子邮件列(第1列)中的值来确定细分列的值(第4列)。

例如,如果电子邮件包含“nhs.net”,则段列应显示为“Health - NHS”。

目前,segment列中显示的是“Unknown Specialism”,如果另一列中的值为true,我不确定如何使用bash覆盖此值。

示例

zoe.russell@nhs.net,zoe,russell,未知专业

将成为:

zoe.russell@nhs.net,zoe,russell,健康 - NHS

到目前为止我有这个...(我的第一个bash脚本,第一个q在这里)

#!/bin/bash

echo 'enter the email domain you are searching for in the email field'
read email 
echo 'please enter the file you wish to search'
read file
echo 'ok looking for' $email 'in' $file
echo ...
# cat $file | grep -E -i $email

x=$(cat $file | grep -E -i $email | wc -l)
echo 'ok' $x 'email address were found in' $file
echo 'here is a sample of the first 10 lines in the segment column' 
cat us.tmp | cut -d ',' -f10 | head -10 

echo 'please enter the segment name you want to replace these with'
read new
echo value will be replaced with $new

2 个答案:

答案 0 :(得分:0)

根据您的要求,您可以使用以下awk -

$cat file
zoe.russell@nhs.net, zoe, russell, Unknown Specialism
$awk -F, '{if($1 ~ /nhs.net/) {$4=" Health - NHS"}; print $0}' OFS=, f
zoe.russell@nhs.net, zoe, russell, Health - NHS

答案 1 :(得分:0)

在awk中使用另一个文件进行段列替换:

$ cat repl.txt
nhs.net, Health - NHS

代码:

$ awk '
BEGIN { FS=OFS="," }                            # delimiters are: ,
NR==FNR { a[$1]=$2; next }                      # read replacements in a hash
split($1,t,"@") && (t[2] in a) {                # get the domain name and use is 
    $NF=a[t[2]]                                 # as reference to a hash
}
1' repl.txt file                                # 1 is the print command
zoe.russell@nhs.net, zoe, russell, Health - NHS