如何使用python groupby从给定的文本文件中拆分测试名称和日志详细信息

时间:2018-11-08 09:31:46

标签: python python-2.7

从以下输入文件中,我想分割testname和关联的logdetails

输入文件:

2/1/1/2/tasdf.c:

LOG:
        backslash-newline should be deleted before tokenizing
    No diagnostics line
RESULT: 2/1/1/2/tasdf.c                                          FAILED

----------------------------------------
2/1/1/2/tlasdf.c:

LOG:
+++ stderr ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tlasdf.c:15:5: error: use of undeclared identifier '_t'
    t x[] = L\
    ^
ls: cannot access '*.o': No such file or directory
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    | T | Translation Phases | 2 | \\ | L | 2 |
    Compilation failed
RESULT: 2/1/1/2/tlasdf.c                                          FAILED

----------------------------------------
2/2/4/1/texasdfgen(0):

LOG:
    511 external identifiers in one source file
    Compilation failed ungracefully
RESULT: 2/2/4/1/textasdf.gen                                    FAILED

用于拆分的代码:

import re
import sys

#inputfile
TEST = sys.argv[1]

#Open input file and match testname
def testname(FILE):
    testlist=[]
    for line in open(FILE, 'r+'):
        match1 = re.search(r'.*\.c\:$|.*\.gen\(\d+\)\:$', line)
        if match1:
            testname = match1.group(0)
            testlist.append(testname)
    return(testlist)

#Open input file and match log details
def logdetail(FILE):
array = []
with open(TEST) as f:
    for line in f:
        if line.startswith('LOG:'):
            for line in f:
                if line.startswith('RESULT:'):
                    break
             # else process lines from section
                array.append(line)
print(array)    
testname = testname(TEST)
for test in testname:
    print (test)        

loddetails = logdetail1(TEST)
for log in loddetails:
    print(log)

testname可以正确打印,并且数组中有logdetails,但是如何合并与testname相关的logdetails

当前代码的输出:

2/1/1/2/tasdf.c:
2/1/1/2/tlasdf.c:
2/2/4/1/tiasdf.gen(0):
['backslash-newline should be deleted before tokenizing', 'No diagnostics line', '+++ stderr ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++', "tlasdf.c:15:5: error: use of undeclared identifier '_t'", 't x[] = L\\', '^', "ls: cannot access '*.o': No such file or directory", '+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++', '| T | Translation P
hases | 2 | \\\\ | L | 2 |', 'Compilation failed', '511 external identifiers in one source file', 'Compilation failed ungracefully', '8 nesting levels of #include files', 'Compilation failed ungracefully']

预期输出:

2/1/1/2/tasdf.c:            backslash-newline should be deleted before tokenizing No diagnostics line
2/1/1/2/tlasdf.c:           +++ stderr ++++++++++++++++++++++++++++++++++++++tlex2.c:15:5: error: use of undeclared identifier 't'
2/2/4/1/textasdf.gen(0):    511 external identifiers in one source file  Compilation failed ungracefully

(实际上,我最终的预期输出是将屏幕截图中提到的打印到excel表中)

Expected Output

2 个答案:

答案 0 :(得分:0)

首先对logdetail()进行以下修改:

def logdetail(FILE):
    collect = False
    array = []
    current = []
    with open(FILE, 'r+') as f:
        for line in f:
            if line.startswith('LOG:'):
                collect = True
            else:
                if line.startswith('RESULT: '):
                    collect = False
                    array.append(current)
                    current=[]
                if collect:
                    current.append(line.strip())

    return(array)

然后使用它进行打印(假设始终为len(testname) = len(logdetails)

testname = testname(TEST)
loddetails = logdetail1(TEST)
for test in testname:
    print (test + '\t' +  " ".join(logdetail1[testname.index(test)])) 

答案 1 :(得分:0)

我认为您可以根据您的结果制作字典,并使用该字典的键和值直接填充excel文件。

您需要如下修改logdetail()函数:

def extract_data(path_to_file):
    try:
        in_file = open(path_to_file, "r")
        text = in_file.read()
        in_file.close()
        if text == '':
            return False
    except Exception as exception:
        raise
    return text

def logdetail(TEXT):
    array = []
    Temporary = ''
    for line in TEXT:  
        if 'LOG:' in line:
            pass
        else:
            if 'RESULT:' in line:
                array.append('LOG:'+Temporary)
                Temporary= ''
                continue
            Temporary+=line
    return array

然后使用函数结果制作字典:

BIG_TEXT = extract_data(path_to_file)
loddetails = logdetail(BIG_TEXT.strip().split('\n'))
testnames = testname(TEST)

Merge = {}
for each in testnames:
   try:
      Merge[each] = loddetails[testnames.index(each)]
   except ValueError:
      Merge[each] = 'Not Given'

请注意,您可以跳过函数extract_data()并仅传递文件内容。

最后,您可以为第一个excel列调用字典keys(),为第二个excel列调用values()。

编辑: 要将此字典写入excel文件,并根据所附的屏幕截图:

import xlsxwriter

workbook = xlsxwriter.Workbook(r'C:\Desktop\data.xlsx') # Create an excel file
worksheet = workbook.add_worksheet()

row = 0
for key, value in Merge.items():
    row += 1 ; col = 0
    worksheet.write(row, col, key)
    worksheet.write(row, col+1, value)
workbook.close()