Question

我有两个共享相似标题的CSV文件：sample_scv_1.csv是::

Transaction_date,Name,Payment_Type,Product
1/2/09 6:17,NA,Mastercard,NA
1/2/09 4:53,NA,Visa,NA
1/2/09 13:08,Nick,Mastercard,NA
1/3/09 14:44,Larry,Visa,Goods
1/4/09 12:56,Tina,Visa,Services
1/4/09 13:19,Harry,Visa,Goods

同样，sample_scv_2.csv是::

Transaction_date,Product,Name
1/2/09 6:17,Goods,Janis
1/2/09 4:53,Services,Nicola
1/2/09 13:08,Materials,Asuman

此处在这两个文件中，列/字段Transaction_date，Product，Name很常见，我想在{{{$}}中替换字段Product，Name 1}} iff交易日期在两个文件中都匹配。

这是一个玩具示例，我的文件很大。对于此示例，我可以分隔列相等的情况，并使用索引替换为csvtool：

sample_scv_1.csv

我需要的输出是::

head -4 sample_scv_1.csv > temp1.csv
tail -3 sample_scv_1.csv > temp1_1.csv
#sudo apt-get install csvtool
csvtool pastecol 2,4 3,2 temp1.csv sample_scv_2.csv > temp1_2.txt
cat temp1_2.txt temp1_1.csv > sample_scv_1.csv

我可以确定事务日期匹配的行，但我不知道两列重叠的索引：如第一个文件中的名称和产品。由于Transaction_date,Name,Payment_Type,Product 1/2/09 6:17,Janis,Mastercard,Goods 1/2/09 4:53,Nicola,Visa,Services 1/2/09 13:08,Asuman,Mastercard,Materials 1/3/09 14:44,Larry,Visa,Goods 1/4/09 12:56,Tina,Visa,Services 1/4/09 13:19,Harry,Visa,Goods的所有列都位于sample_scv_2.csv，因此一个问题很简单。任何有效地做到这一点的方法。

Answer 1

由于文件不大于具有较少列或字段的文件适合内存，因此awk中的解决方案：

$ cat program.awk
BEGIN {FS=OFS=","}         # set the file separators
NR==FNR {                  # for the first file
    p[$1]=$2               # store the product, use date as key
    n[$1]=$3               # name
    next                   # no more processing for the first file
} 
$1 in p {                  # if date found in first processed file
    if($2=="NA") $2=n[$1]  # replace NA with name
    if($4=="NA") $4=p[$1]  # replace NA with product
} 1                        # print the record

运行它：

awk -f program.awk file2 file1
Transaction_date,Name,Payment_Type,Product
1/2/09 6:17 Janis Mastercard Goods
1/2/09 4:53 Nicola Visa Services
1/2/09 13:08 Nick Mastercard Materials
1/3/09 14:44,Larry,Visa,Goods
1/4/09 12:56,Tina,Visa,Services
1/4/09 13:19,Harry,Visa,Goods

如果标题相同，则将一个文件中的多个字段/列替换为另一个文件中的内容

1 个答案: