合并不同列的csv文件

时间:2018-06-03 07:42:11

标签: bash awk sed

我有多个包含信息元素及其值的CSV文件。这些元素的长度及其值不是静态的,因文件而异。我想将它们合并到单个CSV文件中,其中包含元素及其值的表格表示。例如,下面是3个CSV文件 -

1.CSV

a   b   c   d
1   2   7   6

2.CSV

a   b   d
5   6   7

3.CSV

a   b   b   c
33  7.2 0   8

预期产出

Merge.csv

filename    a   b   b   c   d
1.CSV       1   2   ""  7   6
2.CSV       5   6   ""  ""  7
3.CSV       33  7.2  0  8  ""

我用awk / bash尝试过这个但没有成功。让我知道如何使用awk完成此操作。提前感谢。

1 个答案:

答案 0 :(得分:0)

除了你有2个具有相同名称的列并使用逗号分隔的csv之外,这段代码应该有所帮助。

让我们说csv文件具有这种格式

a,b,c,d 
1,2,7,6

然后

sep=","
for file in $(ls -1 *.csv); do
    cols=$(head -n1 $file| grep -o $sep | wc -l | xargs echo "1+" | bc )
    for col in $(seq 1 $cols);do
        colName=$(head -n1 $file | cut -d$sep -f$col)
        [ ! -f  results/$colName ] && echo $colName > results/$colName
        sed 1d $file | cut -d$sep -f$col >> results/$colName
    done
done

paste -d"," results/* >  output.csv

已更新 尝试以下操作,这次我使用了CSV文件就像你把它放在你的例子上一样,并删除了重复的列,用字母替换它" e"

#!/bin/bash

# Create files for headers
head -q -n1 *.csv | tr "," "\n" | sort -u > results/headers

# Create separated files with datas
while read _header; do 
  touch results/$_header
done < results/headers

# Create Final file with headers
echo -n "filename," > results/merge.csv
paste -s -d"," results/headers  >> results/merge.csv

# For each file; 
for _file in $(ls -1 *.csv); do
  cols=$(head -n1 $_file | tr "," "\n" | wc -l)
  echo -n "$_file," >> results/merge.csv

  for _col in $(seq 1 $cols); do
    _fileToWrite=$(head -n1 $_file | cut -d"," -f$_col)
    echo -n "$_file," >> results/$_fileToWrite
    sed 1d "$_file" | cut -d"," -f$_col >> results/$_fileToWrite
  done

  while read _header; do 
    data="$(grep $_file results/$_header | cut -d"," -f2)"
    echo -n "$data," >> results/merge.csv
  done < results/headers

  echo  >> results/merge.csv
done
sed -i 's/,$//g' results/merge.csv

RESULT

filename,a,b,c,d,e
1.csv,1,2,7,6 ,
2.csv,5,6,,7 ,
3.csv,33,7.2,8,,0