第一个问题

Question

关于使用uniq命令的两个问题，请帮助。

第一个问题

说我有两个文件;

$ cat 1.dat
0.1 1.23
0.2 1.45
0.3 1.67

$ cat 2.dat
0.3 1.67
0.4 1.78
0.5 1.89

使用cat 1.dat 2.dat | sort -n | uniq > 3.dat，我可以将两个文件合并为一个。结果是：

0.1 1.23
0.2 1.45
0.3 1.67
0.4 1.78
0.5 1.89

但如果我在1.dat档案中有科学记数法，

$ cat 1.dat
1e-1 1.23
0.2 1.45
0.3 1.67

结果将是：

0.2 1.45
0.3 1.67
0.4 1.78
0.5 1.89
1e-1 1.23

这不是我想要的，我怎样才能让uniq理解1e-1是一个数字，而不是一个字符串。

第二个问题

与上述相同，但这一次，让第二个文件2.dat的第一行略有不同（从0.3 1.67到0.3 1.57）

$ cat 2.dat
0.3 1.57
0.4 1.78
0.5 1.89

然后结果将是：

0.1 1.23
0.2 1.45
0.3 1.67
0.3 1.57
0.4 1.78
0.5 1.89

我的问题是，如何根据第一个文件中的值使用uniq并仅从第一列中查找重复，以便结果仍然是：

0.1 1.23
0.2 1.45
0.3 1.67
0.4 1.78
0.5 1.89

由于

更复杂的测试用例

$ cat 1.dat
1e-6 -1.23
0.2 -1.45
110.7 1.55
0.3 1.67e-3

Answer 1

仅限第一部分：

cat 1.dat 2.dat | sort -g -u

1e-1 1.23
0.2 1.45
0.3 1.67
0.4 1.78
0.5 1.89

man sort

  -g, --general-numeric-sort
          compare according to general numerical value

 -u, --unique
          with -c, check for strict ordering; without -c, output only the first of an equal run

Answer 2

一个awk（gnu awk）单行解决了你的两个问题

  awk '{a[$1*1];b[$1*1]=$0}END{asorti(a);for(i=1;i<=length(a);i++)print b[a[i]];}' file2 file1

使用数据进行测试：注意，我在file2中创建了file1 未排序和1.57，如您所愿：

kent$  head *
==> file1 <==
0.3 1.67
0.2 1.45
1e-1 1.23

==> file2 <==
0.3 1.57
0.4 1.78
0.5 1.89

kent$  awk '{a[$1*1];b[$1*1]=$0}END{asorti(a);for(i=1;i<=length(a);i++)print b[a[i]];}' file2 file1
1e-1 1.23
0.2 1.45
0.3 1.67
0.4 1.78
0.5 1.89

修改

显示0.1而不是1e-1：

kent$ awk '{a[$1*1];b[$1*1]=$2}END{asorti(a);for(i=1;i<=length(a);i++)print a[i],b[a[i]];}' file2 file1 0.1 1.23 0.2 1.45 0.3 1.67 0.4 1.78 0.5 1.89

编辑2

对于精度，awk默认（OFMT）是%.6g你可以改变它。但如果你想用线条显示不同的精度，我们需要一点技巧：

（我在file1中添加了1e-9）

kent$ awk '{id=sprintf("%.9f",$1*1);sub(/0*$/,"",id);a[id];b[id]=$2}END{asorti(a);for(i=1;i<=length(a);i++)print a[i],b[a[i]];}' file2 file1 0.000000001 1.23 0.2 1.45 0.3 1.67 0.4 1.78 0.5 1.89

如果要为所有行显示相同的数字精度：

kent$ awk '{id=sprintf("%.9f",$1*1);a[id];b[id]=$2}END{asorti(a);for(i=1;i<=length(a);i++)print a[i],b[a[i]];}' file2 file1 0.000000001 1.23 0.200000000 1.45 0.300000000 1.67 0.400000000 1.78 0.500000000 1.89

Answer 3

要将科学记数法改为十进制，我使用了python

#!/usr/bin/env python

import sys
import glob

infiles = []

for a in sys.argv:
    infiles.extend(glob.glob(a))

for f in infiles[1:]:
    with open(f) as fd:
        for line in fd:
            data = map(float, line.strip().split())
            print data[0], data[1]

输出：

$ ./sn.py 1.dat 2.dat
0.1 1.23
0.2 1.45
0.3 1.67
0.3 1.67
0.4 1.78
0.5 1.89

在第一列中合并具有科学记数法数据的文件以及如何使用uniq

第一个问题

第二个问题

更复杂的测试用例

3 个答案: