根据列值删除跨文件重复值的文件,并合并unix中的其余文件

时间:2016-05-31 07:09:19

标签: unix

我在UNIX中有一些文件

Say Master_101.txt,Master_102.txt,Master_103.txt,Master_104.txt,Master_105.txt

所有文件都有相同的标题和相同的列,分别为

标题是:Id~name~desigantion~filename

Master_101.txt 的数据是:

Id~name~desigantion~filename 

11~abcd~SE~ Master_101.txt.

12~efg~ASE~ Master_101.txt

Data of Master_102.txt is :

Id~name~desigantion~filename

21~abcd~SE~ Master_102.txt

22~efg~ASE~ Master_102.txt

Data of Master_103.txt is :

Id~name~desigantion~filename

11~abcd~SE~ Master_103.txt

32~efg~ASE~ Master_103.txt

Data of Master_104.txt is  :

Id~name~desigantion~filename

41~abcd~SE~ Master_104.txt

42~efg~ASE~ Master_104.txt

Data of Master_105.txt is  :

Id~name~desigantion~filename

51~abcd~SE~ Master_105.txt

52~efg~ASE~ Master_105.txt 

53~efdgsdg~ASE-T~ Master_105.txt

我需要将所有文件合并到Mater.txt中(不包括在“id”列中有重复的文件)

这里,在Master_101.txt和Master_103.txt中重复id 11。因此,我们只需要将Master_102.txt,Master_104.txt和Master_105.txt合并到Master.txt中。最终的合并文件 Master.txt 数据应该类似于

Id~name~desigantion~filename

21~abcd~SE~MAster_102.txt

22~efg~ASE~MAster_102.txt

41~abcd~SE~MAster_104.txt

42~efg~ASE~MAster_104.txt

51~abcd~SE~MAster_105.txt

52~efg~ASE~MAster_105.txt

53~efdgsdg~ASE-T~MAster_105.txt

1 个答案:

答案 0 :(得分:1)

#### get the list of file on which operation to be performed
ls -lrt inp/ | awk '$1~/^-/{print $9}'  > out/filelist

#### getting total count of files
i=`wc -l out/filelist | cut -f1 -d" "`

#### starting the operation
while read line
do
#### comparing a file to rest files so i variable
i=`expr $i - 1`
tail -$i out/filelist | while read line2
do
####checking first field of file with rest files
if [ `awk -F"~" 'NR==FNR {a[$1]=$1; next}$1 in a {print $0}' inp/$line inp/$line2 | wc -l` -gt 0 ]
then
#### if common record is found then delete that files from filelist. sed with "-i" will do the replacement and deletion in file itself. sed with "-e" is used to add multiple replication or deletion operation in single sed.
sed -i -e '/'"$line"'/d' -e '/'"$line2"'/d' out/filelist
i=`expr $i - 1`
break
fi
done
if [ $i -le 0 ]
then
break
fi
done < out/filelist
#### concatening the rest files  into finalfile
xargs cat < out/filelist > inp/finalfile

如果您有任何疑问,请告诉我。 inp和out是目录。

用于比较前两个字段,将if条件下的部分代码更改为

awk -F"~" 'NR==FNR {a[$1$2]=$1$2; next}$1$2 in a {print $0}' inp/$line inp/$line2