如何在特定文本后熊猫读取日志文件

时间:2021-07-15 13:25:27

标签: python pandas dataframe

我有一个日志文件。我应该简化并将其导出为ex​​cel文件。此日志文件中有特定文本。在这篇文章之后我需要数据。如何读取和导出此日志文件?

Root node processing (before b&c):
  Real time             =    4.89 sec. (2902.79 ticks)
Parallel b&c, 4 threads:
  Real time             =   96.05 sec. (38798.86 ticks)
  Sync time (average)   =    5.63 sec.
  Wait time (average)   =    0.01 sec.
                          ------------
Total (root+branch&cut) =  100.94 sec. (41701.65 ticks)

Solution pool: 8 solutions saved.

MIP - Integer optimal solution:  Objective =  1.1401550956e+03
Solution time =  100.94 sec.  Iterations = 501135  Nodes = 2819
Deterministic time = 41701.69 ticks  (413.12 ticks/sec)


Incumbent solution
Variable Name           Solution Value
x4_5_14                       1.000000
x4_5_24                       1.000000
x4_5_34                       1.000000
x4_5_52                       1.000000
x4_5_82                       1.000000
x4_5_106                      1.000000
x4_5_118                      1.000000
x4_5_142                      1.000000
x4_5_154                      1.000000
x4_6_19                       1.000000
x4_6_29                       1.000000
x4_6_40                       1.000000
x4_6_58                       1.000000
x4_6_88                       1.000000
x4_6_112                      1.000000
x4_6_124                      1.000000
x4_6_148                      1.000000
x4_6_160                      1.000000
x5_5_9                        1.000000
x5_5_19                       1.000000
x5_5_29                       1.000000
x5_5_46                       1.000000
x5_5_58                       1.000000
x5_5_70                       1.000000
x5_5_94                       1.000000
x5_5_130                      1.000000
x5_5_142                      1.000000
x5_5_154                      1.000000
x5_5_166                      1.000000
x5_5_178                      1.000000
x5_6_14                       1.000000
x5_6_24                       1.000000
x5_6_34                       1.000000
x5_6_52                       1.000000
x5_6_64                       1.000000
x5_6_76                       1.000000
x5_6_100                      1.000000
x5_6_136                      1.000000
x5_6_148                      1.000000
x5_6_160                      1.000000
x5_6_172                      1.000000
x5_6_184                      1.000000
x9_5_4                        1.000000
x9_5_14                       1.000000
x9_5_29                       1.000000
x9_5_40                       1.000000
x9_5_64                       1.000000
x9_5_76                       1.000000
x9_5_88                       1.000000
x9_5_100                      1.000000
x9_5_112                      1.000000
x9_5_124                      1.000000
x9_5_136                      1.000000
x9_5_148                      1.000000
x9_5_160                      1.000000
x9_5_172                      1.000000
x9_6_9                        1.000000
x9_6_19                       1.000000
x9_6_34                       1.000000
x9_6_46                       1.000000
x9_6_70                       1.000000
x9_6_82                       1.000000
x9_6_94                       1.000000
x9_6_106                      1.000000
x9_6_118                      1.000000
x9_6_130                      1.000000
x9_6_142                      1.000000
x9_6_154                      1.000000
x9_6_166                      1.000000
x9_6_178                      1.000000
x11_1_12                      1.000000
x11_1_24                      1.000000
x11_1_40                      1.000000
x11_1_60                      1.000000
x11_1_83                      1.000000
x11_1_105                     1.000000
x11_1_128                     1.000000
x11_1_140                     1.000000
x11_1_154                     1.000000
x11_2_19                      1.000000
x11_2_32                      1.000000
x11_2_52                      1.000000
x11_2_72                      1.000000
x11_2_94                      1.000000
x11_2_116                     1.000000
x11_2_135                     1.000000
x11_2_148                     1.000000
x11_2_162                     1.000000
x17_1_30                      1.000000
x17_1_136                     1.000000
x17_2_37                      1.000000
x17_2_142                     1.000000
x18_1_10                      1.000000
x18_1_23                      1.000000
x18_1_36                      1.000000
x18_1_56                      1.000000
x18_1_76                      1.000000
x18_1_99                      1.000000
x18_1_121                     1.000000
x18_1_137                     1.000000
x18_1_149                     1.000000
x18_1_184                     1.000000
x18_1_196                     1.000000
x18_1_208                     1.000000
x18_2_17                      1.000000
x18_2_30                      1.000000
x18_2_48                      1.000000
x18_2_68                      1.000000
x18_2_88                      1.000000
x18_2_110                     1.000000
x18_2_131                     1.000000
x18_2_143                     1.000000
x18_2_156                     1.000000
x18_2_190                     1.000000
x18_2_202                     1.000000
x18_2_214                     1.000000
x23_1_17                      1.000000
x23_1_30                      1.000000
x23_1_153                     1.000000
x23_2_24                      1.000000
x23_2_37                      1.000000
x23_2_159                     1.000000
x27_1_7                       1.000000
x27_1_19                      1.000000
x27_1_32                      1.000000
x27_1_48                      1.000000
x27_1_68                      1.000000
x27_1_89                      1.000000
x27_1_131                     1.000000
x27_1_143                     1.000000
x27_1_157                     1.000000
x27_1_170                     1.000000
x27_1_202                     1.000000
x27_2_14                      1.000000
x27_2_26                      1.000000
x27_2_40                      1.000000
x27_2_60                      1.000000
x27_2_80                      1.000000
x27_2_100                     1.000000
x27_2_137                     1.000000
x27_2_150                     1.000000
x27_2_165                     1.000000
x27_2_176                     1.000000
x27_2_208                     1.000000
x32_1_19                      1.000000
x32_1_33                      1.000000
x32_1_137                     1.000000
x32_1_153                     1.000000
x32_2_26                      1.000000
macost52                      8.710800
macost60                     54.797800
macost                      599.535600
dricost4                     16.339460
dricost5                     21.878260
dricost9                     25.201540
dricost11                    21.324380
dricost17                     3.877160
dricost18                    26.309300
dricost23                     6.369620
dricost27                    24.924600
dricost32                     8.862080
dricost40                    22.432140
dricost41                     2.492460
dricost43                    21.324380
dricost45                     9.969840
dricost46                    13.293120
dricost47                    11.908420
dricost52                     3.877160
dricost60                    23.539900
dricost                     263.923820
tmil4                       115.290000
tmil5                       153.720000
tmil9                       179.340000
tmil11                      138.150000
tmil17                       30.700000
tmil18                      184.200000
tmil23                       46.050000
tmil27                      168.850000
tmil32                       61.400000
tmil40                      153.500000
tmil41                       15.350000
tmil43                      138.150000
tmil45                       61.400000
tmil46                       92.100000
tmil47                       76.750000
tmil52                       25.620000
tmil60                      168.850000
tmil                       1809.420000
ttime4                      295.000000
ttime5                      395.000000
ttime9                      455.000000
ttime11                     385.000000
ttime17                      70.000000
ttime18                     475.000000
ttime23                     115.000000
ttime27                     450.000000
ttime32                     160.000000
ttime40                     405.000000
ttime41                      45.000000
ttime43                     385.000000
ttime45                     180.000000
ttime46                     240.000000
ttime47                     215.000000
ttime52                      70.000000
ttime60                     425.000000
ttime                      4765.000000
tboar                     10275.000000
nbus                         34.000000
All other variables matching '*' are 0.

我需要“MIP-Integer 最佳解决方案”行之后的数据。我想在“现有解决方案”文本下方提取目标、解决方案时间、迭代次数、节点、确定性时间和数据。

我试过了。

import pandas as pd
import itertools
import os
x = pd.read_csv(os.path.expanduser('G1/Cplex_Cng12/RGroup1_cng12.log'), usecols=[0])
print(x[135:])

但是所需文本上部的行数不一样。所以我不能使用skiprows 功能。我需要简化这一点,只使用“现有解决方案”文本下的数据。并且还需要获得目标、求解时间、迭代和确定性时间值。他们在同一条线上。我需要将这些值分开。

3 个答案:

答案 0 :(得分:1)

您应该使用可靠的分隔符解析文件。这里我选择了 MIP - Integer optimal solution\nIncumbent solutionAll other variables matching 作为分隔符。如果这些代码不可靠,您可能需要修改这些代码。

完整代码:

import re, io

start_collecting_annotations = False
start_collecting_data = False
annotations_lines = []
data_lines = []
with open('/tmp/log.txt') as f:
    while True:
        line = f.readline()
        if line == '':  # if no more lines to read, stop
            break
        if line.startswith('MIP - Integer optimal solution'):
            start_collecting_annotations = True
        if line.startswith('Incumbent solution'):
            start_collecting_data = True
        if start_collecting_annotations:  # here we collect the annotations
            if line == '\n':
                start_collecting_annotations = False
            else:
                annotations_lines.append(line)
        if start_collecting_data:        # here we collect the data
            if line.startswith('All other variables matching'):
                break
            else:
                data_lines.append(line)

annotations = pd.Series(dict([re.split('\s+=\s+', i)
                              for i in re.findall(r'(?:[^\s]+ )?[^\s]+\s+=\s+[^\s]+',
                                                  ' '.join(annotations_lines))
                             ])).astype(float)
df = pd.read_csv(io.StringIO(''.join(data_lines[1:])), sep='\s\s+', index_col=[0])

输出:

>>> annotations

Objective               1140.155096
Solution time            100.940000
Iterations            501135.000000
Nodes                   2819.000000
Deterministic time     41701.690000
dtype: float64

>>> df.head()

               Solution Value
Variable Name                
x4_5_14                   1.0
x4_5_24                   1.0
x4_5_34                   1.0
x4_5_52                   1.0
x4_5_82                   1.0

答案 1 :(得分:0)

这是一种“野蛮”的方式来做到这一点,很多人不会抱怨,但嘿,它有效:

data = pd.read_csv("test.txt", sep='\t')

for i in range(len(data)):
    if data[f'{list(data.columns)[0]}'][i][0:3] == 'MIP':
        Objective = float(data[f'{list(data.columns)[0]}'][i][46:62])
        Solution_time = float(data[f'{list(data.columns)[0]}'][i+1][17:23])
        Iteration = int(data[f'{list(data.columns)[0]}'][i+1][43:49])
        Nodes = int(data[f'{list(data.columns)[0]}'][i+1][59:64])
        Deterministic_time = float(data[f'{list(data.columns)[0]}'][i+2][21:29])
        break
print(Objective, Solution_time, Iteration, Nodes, Deterministic_time)

test.txt 是你上面的数据,我只是把它复制成一个txt。

答案 2 :(得分:0)

我以另一种方式创建了一个解决方案,我创建了一个 .txt 文件,其中包含您提到的数据。我将它上传到python,然后创建excel,然后读取它并创建两个数据框,一个用于头部数据,第二个用于尾部。我知道这不是最有效的方法,但我仍在学习:)

import xlwt
import xlrd

book = xlwt.Workbook()
ws = book.add_sheet('First Sheet')
f = open('tekst.txt', 'r+')
data = f.readlines()
for i in range(len(data)):
  row = data[i].split()
  for j in range(len(row)):
    ws.write(i, j, row[j])
#Creation Excel
book.save('Excelfile' + '.xls')
f.close()
#Write Excel and modify data
df = pd.read_excel('Excelfile.xls')
df = df[['Root', 'node', 'processing', '(before', 'b&c):']]
df = df[10:]
df = df.reset_index(drop=True)
df = df.rename(columns={df.columns[0]: 'Col_1',df.columns[1]: 'Col_2',df.columns[2]: 'Col_3',df.columns[3]: 'Col_4',df.columns[4]: 'Col_5'})
df_head = df[0:3].reset_index(drop=True)
df_bottom = df[7:].reset_index(drop=True)
df_bottom = df_bottom[['Col_1','Col_2']]
df.rename(columns={df_bottom.columns[0]: 'Variable Name',df_bottom.columns[1]: 'Solution Value'})

输出是这样的:

df_bottom
       Col_1     Col_2
0    x4_5_14  1.000000
1    x4_5_24  1.000000
2    x4_5_34  1.000000
3    x4_5_52  1.000000
df_head
           Col_1 Col_2    Col_3     Col_4      Col_5
0            MIP     -  Integer   optimal  solution:
1       Solution  time        =    100.94       sec.
2  Deterministic  time        =  41701.69      ticks

希望能帮到你