我有多个包含信息元素及其值的CSV文件。这些元素的长度及其值不是静态的,因文件而异。我想将它们合并到单个CSV文件中,其中包含元素及其值的表格表示。例如,下面是3个CSV文件 -
1.CSV
a b c d
1 2 7 6
2.CSV
a b d
5 6 7
3.CSV
a b b c
33 7.2 0 8
预期产出
Merge.csv
filename a b b c d
1.CSV 1 2 "" 7 6
2.CSV 5 6 "" "" 7
3.CSV 33 7.2 0 8 ""
我用awk / bash尝试过这个但没有成功。让我知道如何使用awk完成此操作。提前感谢。
答案 0 :(得分:0)
除了你有2个具有相同名称的列并使用逗号分隔的csv之外,这段代码应该有所帮助。
让我们说csv文件具有这种格式
a,b,c,d
1,2,7,6
然后
sep=","
for file in $(ls -1 *.csv); do
cols=$(head -n1 $file| grep -o $sep | wc -l | xargs echo "1+" | bc )
for col in $(seq 1 $cols);do
colName=$(head -n1 $file | cut -d$sep -f$col)
[ ! -f results/$colName ] && echo $colName > results/$colName
sed 1d $file | cut -d$sep -f$col >> results/$colName
done
done
paste -d"," results/* > output.csv
已更新 尝试以下操作,这次我使用了CSV文件就像你把它放在你的例子上一样,并删除了重复的列,用字母替换它" e"
#!/bin/bash
# Create files for headers
head -q -n1 *.csv | tr "," "\n" | sort -u > results/headers
# Create separated files with datas
while read _header; do
touch results/$_header
done < results/headers
# Create Final file with headers
echo -n "filename," > results/merge.csv
paste -s -d"," results/headers >> results/merge.csv
# For each file;
for _file in $(ls -1 *.csv); do
cols=$(head -n1 $_file | tr "," "\n" | wc -l)
echo -n "$_file," >> results/merge.csv
for _col in $(seq 1 $cols); do
_fileToWrite=$(head -n1 $_file | cut -d"," -f$_col)
echo -n "$_file," >> results/$_fileToWrite
sed 1d "$_file" | cut -d"," -f$_col >> results/$_fileToWrite
done
while read _header; do
data="$(grep $_file results/$_header | cut -d"," -f2)"
echo -n "$data," >> results/merge.csv
done < results/headers
echo >> results/merge.csv
done
sed -i 's/,$//g' results/merge.csv
RESULT
filename,a,b,c,d,e
1.csv,1,2,7,6 ,
2.csv,5,6,,7 ,
3.csv,33,7.2,8,,0