Question

我有3个文件如下，所有3个文件都有相同数量的col和row（超过数百个）。我想要的是：如果File1和File2中的数字落在特定范围内，则找到col / row，然后将File3中的数字保持为相同的索引，并将“0”设置为其他数字。例如：从File1和File2，只有col2 / row2处的数字可以满足标准（0 <88 <100,0 <6 <10），然后从File3保持数字8并将“0”分配给所有其他数字。是否可以使用awk来做到这一点？还是蟒蛇？谢谢。

File1中：

-10 -10 9 
-20 88 106 
-30 300 120

文件2：

-6 0 -7
-5 6 1
-2 18 32

文件3：

4 3 5 
6 8 8
10 23 14

输出

0 0 0
0 8 0
0 0 0

Answer 1

关注awk会有所帮助。

awk '
FNR==1                 { count++             }  ##Checking condition if FNR==1 then increment variable count with 1 each time.
count==1               {                        ##Checking condition if count is either 1 or 2 if yes then do following.
   for(i=1;i<=NF;i++)  {                        ##Starting a usual for loop from variable value 1 to till value of NF here and doing following.
     if($i>0 && $i<100){ a[FNR,i]++          }  ##Checking condition if a field value is greater than 0 and lesser than 100 then increment 1 count for array a whose index is line_number and column_number here. So this will have the record of which ever line whichever column has values in range and if count is 2 then we should print it.
}}
count==2               {
   for(i=1;i<=NF;i++)  {
     if($i>0 && $i<10) { a[FNR,i]++          }
}}
count==3               {                        ##Checking condition if variable count is 3 here then do following.
   for(j=1;j<=NF;j++)  { $j=a[FNR,j]==2?$j:0 }; ##Starting a for loop here from 1 to till NF value and checking condition if array a with index of line_number and column_number is 2(means both File1 and File2 have same ranges) then keep its same value else make it 0 as per OP request.
   print                                     }  ##Printing the current line edited/non-edited value here.
' File1 File2 File3                             ##Mentioning all Input_file(s) here.

输出如下。

0 0 0
0 8 0
0 0 0

Answer 2

你有一个很棒的awk答案。

以下是使用numpy在Python中执行此操作的方法。

首先，阅读文件：

import numpy as np
arrays=[]
for fn in ('file1', 'file2', 'file3'):
    with open(fn) as f:
        arrays.append(np.array([line.split() for line in f],dtype=float))

然后创建一个掩码矩阵来过滤所需的条件：

mask=(arrays[0]>0) & (arrays[0]<100) & (arrays[1]>0) & (arrays[1]<10)

然后通过掩码将第三个数组（arrays[2]是第三个文件）相乘：

>>> arrays[2] * mask.astype(float)
[[0. 0. 0.]
 [0. 8. 0.]
 [0. 0. 0.]]

根据多个文件查找匹配的索引并打印

2 个答案: