我在UNIX中有一些文件
Say Master_101.txt,Master_102.txt,Master_103.txt,Master_104.txt,Master_105.txt
所有文件都有相同的标题和相同的列,分别为
标题是:Id~name~desigantion~filename
Master_101.txt 的数据是:
Id~name~desigantion~filename
11~abcd~SE~ Master_101.txt.
12~efg~ASE~ Master_101.txt
Data of Master_102.txt is :
Id~name~desigantion~filename
21~abcd~SE~ Master_102.txt
22~efg~ASE~ Master_102.txt
Data of Master_103.txt is :
Id~name~desigantion~filename
11~abcd~SE~ Master_103.txt
32~efg~ASE~ Master_103.txt
Data of Master_104.txt is :
Id~name~desigantion~filename
41~abcd~SE~ Master_104.txt
42~efg~ASE~ Master_104.txt
Data of Master_105.txt is :
Id~name~desigantion~filename
51~abcd~SE~ Master_105.txt
52~efg~ASE~ Master_105.txt
53~efdgsdg~ASE-T~ Master_105.txt
我需要将所有文件合并到Mater.txt中(不包括在“id”列中有重复的文件)
这里,在Master_101.txt和Master_103.txt中重复id 11。因此,我们只需要将Master_102.txt,Master_104.txt和Master_105.txt合并到Master.txt中。最终的合并文件 Master.txt 数据应该类似于
Id~name~desigantion~filename
21~abcd~SE~MAster_102.txt
22~efg~ASE~MAster_102.txt
41~abcd~SE~MAster_104.txt
42~efg~ASE~MAster_104.txt
51~abcd~SE~MAster_105.txt
52~efg~ASE~MAster_105.txt
53~efdgsdg~ASE-T~MAster_105.txt
答案 0 :(得分:1)
#### get the list of file on which operation to be performed
ls -lrt inp/ | awk '$1~/^-/{print $9}' > out/filelist
#### getting total count of files
i=`wc -l out/filelist | cut -f1 -d" "`
#### starting the operation
while read line
do
#### comparing a file to rest files so i variable
i=`expr $i - 1`
tail -$i out/filelist | while read line2
do
####checking first field of file with rest files
if [ `awk -F"~" 'NR==FNR {a[$1]=$1; next}$1 in a {print $0}' inp/$line inp/$line2 | wc -l` -gt 0 ]
then
#### if common record is found then delete that files from filelist. sed with "-i" will do the replacement and deletion in file itself. sed with "-e" is used to add multiple replication or deletion operation in single sed.
sed -i -e '/'"$line"'/d' -e '/'"$line2"'/d' out/filelist
i=`expr $i - 1`
break
fi
done
if [ $i -le 0 ]
then
break
fi
done < out/filelist
#### concatening the rest files into finalfile
xargs cat < out/filelist > inp/finalfile
如果您有任何疑问,请告诉我。 inp和out是目录。
用于比较前两个字段,将if条件下的部分代码更改为
awk -F"~" 'NR==FNR {a[$1$2]=$1$2; next}$1$2 in a {print $0}' inp/$line inp/$line2