Question

我需要帮助从许多不同的文件中提取一列数字并将其显示在输出文件中。

具体来说，我想根据第一列的值从每个文件$2，file1.txt等中提取第二列（file2.txt），然后将所有提取的内容一个文件中的列out.txt。

问题是第一列在每个文件中有不同的间隔：

file：

0.50 x1
1.25 x2
1.50 x3
1.75 x4
2.00 x5

file2：

0.25 y1
0.50 y2
1.00 y3
1.25 y4
2.00 y5

期望的输出：

0.25    y1
0.50 x1 y2
1.00    y3
1.25 x2 y4
1.50 x3
1.75 x4
2.00 x5 y5

Answer 1

这是数字的重要格式，而不是值。你应该用正则表达式编写数字，点和一个或多个数字：\ d \。\ d +

如果您的文件有更多列，那么如何提取精确列的最佳方法是首先使用awk。这样您可以设置列号：

$ var=3
$ ls -l | awk '{print $'$var'}'

我不认为这是bash的任务（我不说这是不可能的），所以我在python中编写了我的解决方案：

import re, sys

num = {}
files = ['file1', 'file2']

for file in files:
    f = open(file,'r')
    for line in f.readlines():
        cont = re.match(r"(\d+\.\d+)\s(.*)", line)
        if cont != None:
            if float(cont.group(1)) not in num:
                num[float(cont.group(1))] = []
            num[float(cont.group(1))].append(cont.group(2))
    f.close()

for key in num:
    sys.stdout.write(str(key)+' ')
    print num[key]

文件1：

0.5 x1
0.8 x2
0.3 x3

file2的：

1.3 y1
0.5 y2
0.0 y3

输出：

0.5 ['x1', 'y2']
0.0 ['y3']
1.3 ['y1']
0.3 ['x3']
0.8 ['x2']

Answer 2

您可以使用gawk和2-d数组执行此操作：

gawk 'FNR==NR{a[$1][0]=$2;a[$1][1]=1;next} {print $0,a[$1][0]; a[$1][1]=0;} END{for(i in a){if (a[i][1] == 1) print i,a[i][0];}}' file2 file1

输出：

0.50 x1 y2
1.25 x2 y4
1.50 x3 
1.75 x4 
2.00 x5 y5
1.00 y3
0.25 y1

根据第一列值

2 个答案: