比较两个文本文件中的每一行

时间:2016-08-04 13:45:29

标签: python file text comparison abaqus

我希望我能帮助解决这个问题:

我有两个文本文件,由大约10,000行组成(让我们说File1和File2)来自FEM分析。文件的结构是:

File1中

        ....
     Element           Facet            Node  CNORMF.Magnitude     CNORMF.CNF1     CNORMF.CNF2     CNORMF.CNF3          CPRESS         CSHEAR1         CSHEAR2  CSHEARF.Magnitude    CSHEARF.CSF1    CSHEARF.CSF2    CSHEARF.CSF3

         881               3            6619              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
         881               3            6648              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
         881               3            6653              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
         930               3            6452              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
         930               3            6483              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
         930               3            6488              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        1244               2            7722              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        1244               2            7724              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        1244               2            7754              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        2380               2            3757     304.326E-06    -123.097E-06    -203.689E-06    -189.663E-06     564.697E-06    -281.448E-06     22.5357E-06     152.710E-06     144.843E-06    -26.7177E-06    -40.3387E-06
        2380               2            3826     226.603E-06    -85.9859E-06    -161.270E-06    -133.967E-06     270.594E-06    -134.865E-06     10.7988E-06     117.700E-06     116.217E-06    -4.67318E-06    -18.0298E-06
        2380               2            3848     10.4740E-03    -2.01174E-03    -6.63900E-03    -7.84743E-03     771.739E-06    -384.638E-06     30.7983E-06     5.24148E-03     5.12795E-03    -541.446E-06    -940.251E-06
        2894               2            8253              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        2894               2            8255              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        2894               2            8270              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3372               2            5920              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3372               2            5961     52.7705E-03     12.2948E-03    -40.8019E-03    -31.1251E-03     7.36309E-03    -2.56505E-03    -502.055E-06     18.8167E-03     17.9038E-03     2.12060E-03     5.38774E-03
        3372               2            5996              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3936               3            6782              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3936               3            6852              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3936               3            6857              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3937               4            6410              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3937               4            6452              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3937               4            6488              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3955               2            6940              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3955               2            6941              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        3955               2            6993              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        4024               2            8027              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.
        4024               2            8050              0.              0.              0.              0.              0.              0.              0.              0.              0.              0.              0. 
        ....

文件2

        ....
        Node  COORD.Magnitude     COORD.COOR1     COORD.COOR2     COORD.COOR3     U.Magnitude            U.U1            U.U2            U.U3
           1         131.691         14.5010        -92.2190        -92.8868         1.93638     188.252E-03        -1.64949    -996.662E-03
           2         131.336         10.9038        -92.2281        -92.8663         1.93341     188.250E-03        -1.64672    -995.468E-03
           3         132.130         18.7534        -92.4681        -92.5002         1.93968     188.190E-03        -1.65258    -997.959E-03
           4         130.769         1.97638        -92.5186        -92.3953         1.92580     188.179E-03        -1.63965    -992.387E-03
           5         130.560        -4.04517        -93.1433        -91.3993         1.92030     188.026E-03        -1.63459    -990.122E-03
           6         132.422         24.0768        -93.9662        -90.1454         1.94282     187.819E-03        -1.65564    -999.062E-03
           7         130.377        -8.39503        -94.1640        -89.7827         1.91586     187.774E-03        -1.63054    -988.235E-03
           8         126.321         13.6556        -88.0641        -89.5278         1.93579     192.554E-03        -1.64736    -998.202E-03
           9         125.963         4.31065        -88.6558        -89.3771         1.92786     192.145E-03        -1.64012    -994.852E-03
          10         130.037         3.02359        -94.4877        -89.2894         1.92501     187.692E-03        -1.63909    -991.871E-03
          11         126.692         18.5888        -88.1164        -89.1107         1.93970     192.653E-03        -1.65097    -999.810E-03
          12         125.751        -1.96189        -89.1238        -88.6928         1.92231     192.010E-03        -1.63500    -992.572E-03
          13         125.719        -3.46723        -89.2798        -88.4437         1.92094     191.971E-03        -1.63373    -992.005E-03
          14         130.026         7.42596        -95.0372        -88.4289         1.92818     187.556E-03        -1.64210    -993.086E-03
          15         130.736         16.3557        -95.3755        -87.9092         1.93527     187.472E-03        -1.64873    -995.891E-03
          16         130.251        -12.8122        -95.5572        -87.5783         1.91105     187.430E-03        -1.62618    -986.163E-03
          17         130.250         12.8770        -95.6602        -87.4548         1.93216     187.401E-03        -1.64586    -994.616E-03
          18         125.609        -7.73838        -90.1949        -87.0785         1.91668     191.718E-03        -1.62985    -990.191E-03
          19         124.466        -6.21492        -88.8834        -86.9075         1.91827     192.783E-03        -1.63095    -991.270E-03
          20         126.958         23.9470        -89.5421        -86.7584         1.94289     192.337E-03        -1.65406        -1.00096
          21         121.210         6.64491        -84.7929        -86.3587         1.92993     196.112E-03        -1.64059    -997.316E-03
          22         121.369         12.5781        -84.3620        -86.3434         1.93495     196.450E-03        -1.64514    -999.468E-03 
        ....

我想执行以下步骤:

  1. 从File1
  2. 中删除前两列
  3. 比较两个文件的节点标签
  4. 在" rpt"中写入输出文本文件包含具有相同"节点标签"的行的格式并排
  5. 这是我用过的代码。看起来它适用于小文件。但对于大文件,需要花费大量时间。

    nodEl = open("P:/File1.rpt", "r")
    uniNod = open("P:/File2.rpt", "r")
    
    row_nodEl  = nodEl.readlines()
    row_uniNod = uniNod.readlines()
    
    nodEl.close()
    uniNod.close()
    
    output = open("P:/output.rpt", "w")
    
    for index, line in enumerate(row_nodEl):
        if index > 23081 and index < 40572 and index !=23083 and index !=23084:
            temp  = line.strip()
            temp2 = " ".join(temp.split()) 
            var   = temp2.split(" ",3) 
            for index2, line2 in enumerate(row_uniNod):
                if index2 > 11412 and index2 < 21258 and index2 != 11414 and index2 !=11415: 
                    temp3 = line.strip()
                    temp4 = " ".join(temp3.split())
                    var2  = temp4.split(" ",1)
                    if var[2] == var2[0]:
                        output.write("%s" %var[2]) + " " + "%s" %var[3] + " " + "%s" %var2[1])
    

    任何建议都欢迎!

1 个答案:

答案 0 :(得分:1)

您正在将一个文件的每一行(m行)与另一个文件的每一行(n行)进行比较。这会导致时间复杂度O(m*n)。这意味着两个文件,每个文件有10,000行,将产生100,000,000个比较。

如果您更改了读取值的方式,则可以加快代码速度。考虑将文件读入字典而不是列表。字典中的每个键都是一个节点号,每个值都是完整的一行。

使用此方法,您可以执行以下操作:

  1. 将第一个文件加载到字典中
  2. 将第二个文件加载到字典中
  3. 对于第一个字典中的每个节点,在第二个字典中找到相应的节点
  4. 使用Python,它看起来与此类似

    file_contents_1 = load_file("P:/File1.rpt")
    file_contents_2 = load_file("P:/File2.rpt")
    
    for node_label in file_contents_1:
        # Skip processing node which doesn't have corresponding values in the second file
        if not node_label in file_contents_2:
            continue
        # Do something
    

    这种方法的好处是您可以单独加载文件,这意味着时间复杂度现在变为线性O(m+n)。在第二个文件中查找相应的节点时,由于字典的实现方式(即哈希表),因此具有恒定的时间复杂度。

    这可以让您的代码更快。