文本更清晰的文本excel电子表格项目

时间:2017-08-25 04:40:03

标签: python excel python-3.x

我有一个充满pc数据的文本文件,组织为两种类型之一的块列表。之一:

*redacted*

我的目标是让Python(3.6.2)打开并读取文件,清理它,然后将数据编译成excel电子表格,如下所示:

Column 1: PC name 
Column 2: Error Type (0 if none, 1-4 for 4 error types)
Column 3: ID (if no error, no braces containing the ID)
Column 4: Password (if no error, just the password)

这是我的代码。我使用Pycharm,并且处于虚拟环境中:

import xlsxwriter

workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()

bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})

worksheet.set_column(0, 0, 14)
worksheet.set_column(1, 1, 5)
worksheet.set_column(2, 2, 38)
worksheet.set_column(3, 3, 55)

worksheet.write('A1', 'Name', bold)
worksheet.write('B1', 'Error', bold)
worksheet.write('C1', 'ID', bold)
worksheet.write('D1', 'Password', bold)

def nonblank_lines(f):
    for l in f:
        line = l.rstrip()
        if line:
            yield line.lstrip

with open("C:\\Users\\MyName\\Desktop\\BLRP.txt", "r+") as op:
    gold_lst = []
    nonblank = nonblank_lines(op)
    for line in nonblank:
        if line.startswith("Computer Name"):
            gold_lst.append(str(line))
            gold_lst.append("NO ERROR")
        elif line.startswith("ID"):
            gold_lst.append("IDG: " + str(line))
            gold_lst.append('NO ERROR')
        elif line.startswith("ERROR: An error occurred while"):
            gold_lst.append('1')
            gold_lst.append(str('ID: {' + line + '}'))
            gold_lst.append(str('Password: '))
        elif line.startswith("ERROR: No key"):
            gold_lst.append('2')
            gold_lst.append(str('ID: {' + line + '}'))
            gold_lst.append(str('Password: '))
        elif line.startswith("ERROR: An error occurred (code 0x80070057)"):
            gold_lst.append('3')
            gold_lst.append(str('ID: {' + line + '}'))
            gold_lst.append(str('Password: '))
        elif line.startswith("ERROR: An error occurred (code 0x8004100e)"):
            gold_lst.append('4')
            gold_lst.append(str('ID: {' + line + '}'))
            gold_lst.append(str('Password: '))
        elif line.startswith("Password"):
            gold_lst.append(str('Password: ' + next(nonblank)))
    print(gold_lst)
    op.close()

pc_data = (gold_lst)

row = 1
col = 0

for obj in pc_data:
    if obj.startswith("Computer Name"):
        worksheet.write_string(row, col, obj[15:])
    elif obj.startswith('NO'):
        worksheet.write_number(row, col + 1, 0, left)
    elif obj.startswith('1'):
        worksheet.write_number(row, col + 1, int(obj), left)
    elif obj.startswith('2'):
        worksheet.write_number(row, col + 1, int(obj), left)
    elif obj.startswith('3'):
        worksheet.write_number(row, col + 1, int(obj), left)
    elif obj.startswith('4'):
        worksheet.write_number(row, col + 1, int(obj), left)
    elif obj.startswith("ID: {ERROR"):
        worksheet.write_string(row, col + 2, '')
    elif obj.startswith("IDG: "):
        worksheet.write_string(row, col + 2, obj[10:-1])
    elif obj.startswith("Password"):
        worksheet.write_string(row, col + 3, obj[9:]) 
        row += 1

workbook.close()

现在,这对于有问题的文件非常有效,但是,除了非常不理想的代码之外,我确信,我可以明确地看到需要改进的内容。在这个块中:

if line.startswith("Computer Name"):
                gold_lst.append(str(line))
                gold_lst.append("NO ERROR")

我只想要" NO ERROR"如果我的行以"计算机名称"开头,则附加到我的列表中并且下一个非空白行不以" ERROR开头。"当然,我试过这个:

if line.startswith("Computer Name"):
        if next(nonblank).startswith("ERROR"):
            gold_lst.append(str(line))
        elif next(nonblank).startswith("VOLUME"):
            gold_lst.append(str(line))
            gold_lst.append("NO ERROR")

问题是,这会创建一个顶起的Excel电子表格,我根本不知道为什么。即使在主代码中的后续步骤中我打印gold_lst(只是为了检查列表是否正确),列表非常不准确。我甚至无法弄清楚列表的内容。

我该如何解决这个问题?

关于第二个问题,如果我可以在同一主题中提出这个问题,我将来可能收到的此类更常规的文本文件可能包含具有多个ID和密码的计算机。如果我不得不猜测,该块看起来像这样:

*redacted*

可能有超过2个这样的ID /密码组合。 如何修改我的代码以实现此目的?就目前而言,我的代码不会轻易解释这一点。我对Python很陌生,所以也许它可以,但我不会看到它。

1 个答案:

答案 0 :(得分:1)

解决此问题的方法如下:

  1. 读入整个文件,跳过任何空行。
  2. 使用Python的groupby()函数将行列表拆分为基于Computer Name行的块。
  3. 对于每个块,尝试提取错误以及ID和密码列表。如果不存在,请留空。
  4. 对于每个块,将任何提取的数据写入电子表格中的下一行。
  5. 脚本如下:

    from itertools import groupby
    import xlsxwriter
    import re
    
    workbook = xlsxwriter.Workbook('Computer Data.xlsx')
    worksheet = workbook.add_worksheet()
    
    bold = workbook.add_format({'bold': True})
    left = workbook.add_format({'align': 'justify'})
    cols = [('Name', 14), ('Error', 5), ('ID1', 38), ('Password1', 55), ('ID2', 38), ('Password2', 55), ('ID3', 38), ('Password3', 55)]
    
    for colx, (heading, width) in enumerate(cols):
        worksheet.write_string(0, colx, heading, bold)
        worksheet.set_column(colx, colx, width)
    
    rowy = 1
    lines = []
    data = []
    computer_name = None
    
    with open('BLRP.txt') as f_input:
        lines = [line.strip() for line in f_input if len(line.strip())]
    
    for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
        if k:
            computer_name = re.search(r'Computer Name:\s*(.*)\s*', list(g)[0]).group(1)
        elif computer_name:
            block = list(g)
            error = 'NO ERROR'
            ids = []
            passwords = []
    
            for line_number, line in enumerate(block):
                re_error = re.match('ERROR:\s+"(.*?)"', line)
    
                if re_error:
                    error = re_error.group(1)
    
                if line.startswith('Numerical Password:'):
                    ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
                    passwords.append(block[line_number+3].strip())
    
            worksheet.write_string(rowy, 0, computer_name)
            worksheet.write_string(rowy, 1, error)
    
            for index, (id, pw) in enumerate(zip(ids, passwords)):
                worksheet.write_string(rowy, index * 2 + 2, id)
                worksheet.write_string(rowy, index * 2 + 3, pw)
    
            rowy += 1   # Advance to the next output row
    
    workbook.close()   
    

    假设您的BLRP.txt如下:

    Computer Name: "Name Here1"
    
    ERROR: "some type of error"
    
    Blah blah
    Blah blah
    Blah blah
    
    Computer Name: "Name Here2"
    
    Volume blah blah
    Blah Blah
    
             Numerical Password:
                ID: {"The ID1 is here; long string of random chars"}
                  Password:
                    "Password1 here; also a long string"
    
    Blah Blah
    Blah Blah
    
             Numerical Password:
                ID: {"The ID2 is here; long string of random chars"}
                  Password:
                    "Password2 here; also a long string"
    
    Blah Blah
    Blah Blah
    
             Numerical Password:
                ID: {"The ID3 is here; long string of random chars"}
                  Password:
                    "Password3 here; also a long string"
    
    Blah Blah
    Blah Blah
    

    您将获得如下电子表格:

    Spreadsheet screenshot

    groupby()如何运作?

    通常,当您遍历列表时,它会一次为您提供一个项目。使用groupby(),您可以在" groups"中迭代此列表,其中每个组中的项目数基于条件。条件以函数的形式提供(我使用lambda来避免编写单独的函数)。

    groupby()将构建要返回的项目组,直到函数的结果发生更改。在这种情况下,该函数正在查找以单词Computer Name开头的行。因此,如果这是真的,它将返回一个项目(除非它们上面有两个相邻的行Computer Name)。接下来它将返回所有不以Computer Name开头的行,依此类推。

    它返回两件事,keygroupkey是函数startswith()的结果,该函数可以是TrueFalsegroup是一个包含所有匹配项的iterable。 list(g)用于将其转换为普通列表,在这种情况下,返回下一行Computer Name行之前的所有行。

    将条目写入不同的行并将已知的错误消息转换为数字:

    from itertools import groupby
    import xlsxwriter
    import re
    
    workbook = xlsxwriter.Workbook('Computer Data.xlsx')
    worksheet = workbook.add_worksheet()
    
    bold = workbook.add_format({'bold': True})
    left = workbook.add_format({'align': 'justify'})
    cols = [('Name', 14), ('Error', 5), ('ID', 38), ('Password', 55)]
    
    for colx, (heading, width) in enumerate(cols):
        worksheet.write_string(0, colx, heading, bold)
        worksheet.set_column(colx, colx, width)
    
    rowy = 1
    lines = []
    data = []
    computer_name = None
    
    error_numbers = {
        'An error occurred while connecting to the BitLocker management interface.' : 1, 
        'No key protectors found.' : 2, 
        'An error occurred (code 0x80070057):' : 3,
        'An error occurred (code 0x8004100e):' : 4}
    
    with open('BLRP.txt') as f_input:
        lines = [line.strip() for line in f_input if len(line.strip())]
    
    for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
        block = list(g)
    
        if k:
            computer_name = re.search(r'Computer Name:\s*(.*)\s*', block[0]).group(1)
        elif computer_name:
            error_number = 0        # 0 for NO ERROR
            ids = []
            passwords = []
    
            for line_number, line in enumerate(block):
                re_error = re.match('ERROR:\s+?(.*)\s*?', line)
    
                if re_error:
                    error = re_error.group(1)
                    error_number = error_numbers.get(error, -1)     # Return -1 for an unknown error
    
                if line.startswith('Numerical Password:'):
                    ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
                    passwords.append(block[line_number+3].strip())
    
            worksheet.write_string(rowy, 0, computer_name)
            worksheet.write_number(rowy, 1, error_number)
    
            for id, pw in zip(ids, passwords):
                worksheet.write_string(rowy, 0, computer_name)
                worksheet.write_number(rowy, 1, error_number)
                worksheet.write_string(rowy, 2, id)
                worksheet.write_string(rowy, 3, pw)
                rowy += 1   # Advance to the next output row
    
            if len(ids) == 0:
                rowy += 1   # Advance to the next output row            
    
    workbook.close()