在两个不同的文件中映射两个字符串,这两个文件在bash的行开头共享相同的数字

时间:2012-11-22 18:36:56

标签: bash

我有两个文件。一个包含动物名称:

martin@potato:~$ cat animals.txt 
98 white elefant
103 brown dog
111 yellow cat
138 blue whale
987 pink pig
martin@potato:~$

..和其他包含他们居住的地方:

martin@potato:~$ cat places.txt 
98 safari
99
103 home
105
109
111 flat
138 ocean
500
987 farm
989
martin@potato:~$ 

在animals.txt中的动物名称之前的数字指向正确的位置。输出应该是这样的:

martin@potato:~$ ./animals.sh
safari  white elefant
home    brown dog
flat    yellow cat
ocean   blue whale
farm    pink pig
martin@potato:~$ 

使用位置映射动物名称时bash中最优雅的解决方案是什么?

我是这样做的:

#!/usr/bin/env bash

#content of animals.txt file is stored into "$animals" variable using command substitution
animals=$(<animals.txt)

#content of places.txt file is stored into "$places" variable using command substitution
places=$(<places.txt)


#read(bash builtin) reads "$animals" variable line by line
while read line; do

    #"$animals_number" variable contains number in the beginning of the line; for example "98" in case of first line
    animals_number=$(echo "$line" | sed 's/ .*$//')
    #"$animals_name" variable contains string after number; for example "white elefant" in case of first line
    animals_name=$(echo "$line" | sed 's/[0-9]* //')
    #"$animals_place" variable contains the string after line which starts with "$animals_number" integer in places.txt file;
    #for example "safari" in case of first line
    animals_place=$(echo "$places" | grep -Ew "^$animals_number" | sed 's/.* //')
    #printf is used to map two strings which share the same integer in the beginning of the line
    printf '%s\t%s\n' "$animals_place" "$animals_name"

#process substitution is used to redirect content of "$animals" variable into sdtin of while loop
done < <(echo "$animals")

但是,我不确定这是解决此问题的最优雅/最有效的方法。还有其他方法/技巧吗?

2 个答案:

答案 0 :(得分:3)

while read id place;  do places[$id]=$place;                           done < places.txt
while read id animal; do printf '%s\t%s\n' "${places[$id]}" "$animal"; done < animals.txt

答案 1 :(得分:0)

join <(sort animals.txt) <(sort places.txt) | sort -n

不幸的是,join没有“数字排序”选项,afaik;否则你只能join这两个文件而不是两次排序。 (如果将前导零放入文件中,它也可以在没有sort的情况下工作。)

最近的ubuntus,可能还有其他Linux发行版,将LANG设置为您假定的区域设置。对于sort而言,这是致命的,与join不同,它是区域设置感知的;要使上述工作正常,joinsort必须就分拣顺序达成一致。如果您收到如下错误:

join: /dev/fd/63:5: is not sorted: 98 white elefant

然后试试这个:

( export LC_ALL=C;  join <(sort animals.txt) <(sort places.txt) | sort -n )