我编写了执行csv文件之间比较的脚本。 但我还有问题 我需要永远是
5个值 - 空格 - 5个值
问题是有些行只包含4个值,所以我需要添加而不是缺少值空间columum
输入:
File1中:
1,1,1,1
3,3,3,3,3
文件2:
2,2,2,2
4,4,4,4,4
现在结果如下:
1,1,1,1, ,2,2,2,2
3,3,3,3,3, ,4,4,4,4,4
我需要的结果如下:
1,1,1,1, , , 2,2,2,2,*space*
3,3,3,3,3, ,4,4,4,4,4
这是我的代码:
#! /bin/bash
#------------------------------------------------------------------------------
#
# Description: Joins the files vartically based on the file extensions.
#
# Usage : ./joinfile directory1 directory2
#
#------------------------------------------------------------------------------
#---- Variables ---------------------------------------------------------------
resultfile="resultfile.csv"
#---- Main --------------------------------------------------------------------
# Checking if two arguments are provided, if not, display usage info, and exit.
if [ "$#" -ne 2 ]
then
echo "Usage: $0 directory1 directory2"
exit 1
fi
# Checking if any of the arguments provided is not a directory.
if [ ! -d "$1" -o ! -d "$2" ]
then
if [ ! -d "$1" ]
then
echo "Error: $1 is not a valid directory"
fi
if [ ! -d "$2" ]
then
echo "Error: $2 is not a valid directory"
fi
exit 1
fi
# Removing the end slash from the arguments, if user had provided.
dir1=$(echo "$1" | sed 's/\/$//')
dir2=$(echo "$2" | sed 's/\/$//')
# Creating an array of files having ^ in their filenames.
filearr=( $(ls "$dir1"/*^* "$dir2"/*^*) )
# Getting filearr length.
filearrlen=${#filearr[@]}
# Creating an array of extensions.
for (( i=0; i<"$filearrlen"; i++ ))
do
extarr+=(${filearr[i]##*^})
done
# Removing duplicates and the last extension from an extarr.
OLDIFS="$IFS"
IFS=$'\n'
newextarr=($(for i in "${extarr[@]}"; do echo "$i" | sed 's/\.[^.]*$//'; done | sort -du))
IFS="$OLDIFS"
# Getting newextarr length.
newextarrlen=${#newextarr[@]}
# Removing the previous outfile, if exists.
if [ -e "$resultfile" ]
then
rm "$resultfile"
fi
# Joning the files vertically based on the extensions.
for (( i=0; i<"$newextarrlen"; i++ ))
do
ext="${newextarr[i]}"
echo "Handling ==> $ext"
# Getting files with similar extensions.
joinfiles=($(for j in "${filearr[@]}"; do echo "$j" | grep "\^$ext"; done))
# Getting joinfiles array length.
joinfileslen=${#joinfiles[@]}
# Making a list of files to be pasted.
for (( k=0; k<"$joinfileslen"; k++))
do
pastefiles+="${joinfiles[k]} "
dos2unix "${joinfiles[k]}" 2>/dev/null
cat "${joinfiles[k]}" | grep "^[ \t]*([0-9]* [0-9]*)," | sed 's/^[ \t]*//g' | sort -t, - k1 | cut -d',' -f1- >.ext_${k}_tags.csv
done
# Executing paste command.
echo "==> ${ext}" >> "$resultfile"
awk 'BEGIN{ FS = "," }
{
if(FNR == NR){ a[$1] = $0 } else{ b[$1] = $0 }
for(i in a) {
if (i in b)
{ c[i]=a[i]", ,"b[i]; if (a[i] == b[i] ) { c[i]="True,"c[i]; } else { c[i]="False,"c[i]; }
} else { c[i]="False,"a[i]", ,"i",MISSING-MISSING-MISSING";}
}
for(i in b) {
if (! i in a) { c[i]="False,"i",MISSING-MISSING-MISSING, ,"b[i]; }
}
}
END{
for (i in c){ print c[i]; }
}
' ".ext_0_tags.csv" ".ext_1_tags.csv"|sort -t, -k1 >> "$resultfile"
rm -f ".ext_0_tags.csv" ".ext_1_tags.csv"
done
#---- End ---------------------------------------------------------------------
答案 0 :(得分:1)
这是解决问题的一种方法:
awk -F, '{a[FNR]=a[FNR] sprintf("%s,%s,%s,%s,%s%s",$1,$2,$3,$4,($5==""?" ":$5),(NR==FNR?", ,":""))}
END{for(i=1;i<=FNR;++i)print a[i]}' file1.txt file2.txt
这使用数组将两个文件连接在一起。 %s
语句中的sprintf
采用列的值,如果第五列为空,则采用空格。如果正在处理第一个文件,则最后的%s
将替换为逗号。处理完所有记录后,将打印数组的元素。
这里做了一些假设:假设只有第五列可以为空,并且两个文件中都有相同数量的记录。
输出:
1,1,1,1, , ,2,2,2,2,
3,3,3,3,3, ,4,4,4,4,4
答案 1 :(得分:1)
另一个awk
将字段分隔符和字段分隔符设置为,
如果少于5个字段将字段5设置为空格。
将数组设置为line。
如果第二个文件打印保存第二个文件中的行和行。
awk -F, -vOFS=, 'NF<5{$5=" "}{a[NR]=$0}FNR!=NR{print a[FNR]," ",$0}' file file2
1,1,1,1, , ,2,2,2,2,
3,3,3,3,3, ,4,4,4,4,4
我假设线上只有4个和5个字段,好像少于4个字段不会用空格填充所有空字段。 还假设只有两个文件。