根据多个文件查找匹配的索引并打印

时间:2018-05-23 16:18:50

标签: indexing awk matching multiple-files

我有3个文件如下,所有3个文件都有相同数量的col和row(超过数百个)。我想要的是:如果File1和File2中的数字落在特定范围内,则找到col / row,然后将File3中的数字保持为相同的索引,并将“0”设置为其他数字。例如:从File1和File2,只有col2 / row2处的数字可以满足标准(0 <88 <100,0 <6 <10),然后从File3保持数字8并将“0”分配给所有其他数字。是否可以使用awk来做到这一点?还是蟒蛇?谢谢。

File1中:

-10 -10 9 
-20 88 106 
-30 300 120

文件2:

-6 0 -7
-5 6 1
-2 18 32

文件3:

4 3 5 
6 8 8
10 23 14

输出

0 0 0
0 8 0
0 0 0

2 个答案:

答案 0 :(得分:1)

关注awk会有所帮助。

awk '
FNR==1                 { count++             }  ##Checking condition if FNR==1 then increment variable count with 1 each time.
count==1               {                        ##Checking condition if count is either 1 or 2 if yes then do following.
   for(i=1;i<=NF;i++)  {                        ##Starting a usual for loop from variable value 1 to till value of NF here and doing following.
     if($i>0 && $i<100){ a[FNR,i]++          }  ##Checking condition if a field value is greater than 0 and lesser than 100 then increment 1 count for array a whose index is line_number and column_number here. So this will have the record of which ever line whichever column has values in range and if count is 2 then we should print it.
}}
count==2               {
   for(i=1;i<=NF;i++)  {
     if($i>0 && $i<10) { a[FNR,i]++          }
}}
count==3               {                        ##Checking condition if variable count is 3 here then do following.
   for(j=1;j<=NF;j++)  { $j=a[FNR,j]==2?$j:0 }; ##Starting a for loop here from 1 to till NF value and checking condition if array a with index of line_number and column_number is 2(means both File1 and File2 have same ranges) then keep its same value else make it 0 as per OP request.
   print                                     }  ##Printing the current line edited/non-edited value here.
' File1 File2 File3                             ##Mentioning all Input_file(s) here.

输出如下。

0 0 0
0 8 0
0 0 0

答案 1 :(得分:1)

你有一个很棒的awk答案。

以下是使用numpy在Python中执行此操作的方法。

首先,阅读文件:

import numpy as np
arrays=[]
for fn in ('file1', 'file2', 'file3'):
    with open(fn) as f:
        arrays.append(np.array([line.split() for line in f],dtype=float))

然后创建一个掩码矩阵来过滤所需的条件:

mask=(arrays[0]>0) & (arrays[0]<100) & (arrays[1]>0) & (arrays[1]<10)

然后通过掩码将第三个数组(arrays[2]是第三个文件)相乘:

>>> arrays[2] * mask.astype(float)
[[0. 0. 0.]
 [0. 8. 0.]
 [0. 0. 0.]]