我有几个包含数据的.csv文件。数据供应商创建的文件在第一行中表示年份一次,中间缺少值,第二行中表示变量名称。数据位于第三到第X行。
"year 1", , , "year 2", , ,"year 2", , ,
"Var1", "Var2", "Var3", "Var1", "Var2", "Var3", "Var1", "Var2", "Var3"
"ABC" , 1234 , 4567 , "DEF" , 789 , "ABC" , 1234 , 4567 , "DEF"
我是Shell编程的新手,但编写输出以下内容的脚本应该不会太复杂
"Var1_year1", "Var2_year1", "Var3_year1", "Var1_year2", "Var2_year2", "Var3_year2", "Var1_year3", "Var2_year3", "Var3_year3"
"ABC" , 1234 , 4567 , "DEF" , 789 , "ABC" , 1234 , 4567 , "DEF"
类似
#!/bin/bash
FILES=/Users/pathTo.csvfiles/*.csv
for f in $FILES
do
echo "Processing $f file..."
# 1. Replace the second line with 'Varname_YearX' where YearX comes from the first line
cat ????
# 2. Delete first line
sed -i '' 1d $f
done
echo "Processing complete."
更新:.csv文件的行数有所不同。只需编辑前两行,以下几行是数据。
答案 0 :(得分:1)
如果要合并每个CSV的第一行和第二行,请尝试此操作。
# No point in using a variable for the wildcard
for f in /Users/pathTo.csvfiles/*.csv
do
awk -F , 'NR==1 { # Collect first line
# Squash quotes
gsub(/"/, "")
for(i=1;i<=NF;++i)
y[i] = $i || y[i-1]
next # Do not fall through to print
}
NR==2 { # Combine collected with current
gsub(/"/, "")
for(i=1;i<=NF;++i)
$i = y[i] "_" $i
}
# Print everything (except first)
1' "$f" > "$f.tmp"
mv "$f.tmp" "$f"
done
如果第y[i]
:第一个字段为空,则第一个循环仅将前一个字段的值复制到i
。
答案 1 :(得分:1)
使用csvtool
,各种标准工具和bash
的丑陋代码:
i=file.csv
paste -d_ <(head -2 $i | tail -1 | csvtool transpose -) \
<(head -1 $i | csvtool transpose - |
sed '$d;s/ //;/^$/{g;b};h') |
csvtool transpose - | sed 's/[^,]*/"&"/g' | cat - <(tail +3 $i)
输出:
"Var1_year1","Var2_year1","Var3_year1","Var1_year2","Var2_year2","Var3_year2","Var1_year2","Var2_year2","Var3_year2"
"ABC" , 1234 , 4567 , "DEF" , 789 , "ABC" , 1234 , 4567 , "DEF"