我有一个充满pc数据的文本文件,组织为两种类型之一的块列表。之一:
*redacted*
我的目标是让Python(3.6.2)打开并读取文件,清理它,然后将数据编译成excel电子表格,如下所示:
Column 1: PC name
Column 2: Error Type (0 if none, 1-4 for 4 error types)
Column 3: ID (if no error, no braces containing the ID)
Column 4: Password (if no error, just the password)
这是我的代码。我使用Pycharm,并且处于虚拟环境中:
import xlsxwriter
workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
worksheet.set_column(0, 0, 14)
worksheet.set_column(1, 1, 5)
worksheet.set_column(2, 2, 38)
worksheet.set_column(3, 3, 55)
worksheet.write('A1', 'Name', bold)
worksheet.write('B1', 'Error', bold)
worksheet.write('C1', 'ID', bold)
worksheet.write('D1', 'Password', bold)
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line.lstrip
with open("C:\\Users\\MyName\\Desktop\\BLRP.txt", "r+") as op:
gold_lst = []
nonblank = nonblank_lines(op)
for line in nonblank:
if line.startswith("Computer Name"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
elif line.startswith("ID"):
gold_lst.append("IDG: " + str(line))
gold_lst.append('NO ERROR')
elif line.startswith("ERROR: An error occurred while"):
gold_lst.append('1')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: No key"):
gold_lst.append('2')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: An error occurred (code 0x80070057)"):
gold_lst.append('3')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: An error occurred (code 0x8004100e)"):
gold_lst.append('4')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("Password"):
gold_lst.append(str('Password: ' + next(nonblank)))
print(gold_lst)
op.close()
pc_data = (gold_lst)
row = 1
col = 0
for obj in pc_data:
if obj.startswith("Computer Name"):
worksheet.write_string(row, col, obj[15:])
elif obj.startswith('NO'):
worksheet.write_number(row, col + 1, 0, left)
elif obj.startswith('1'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('2'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('3'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('4'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith("ID: {ERROR"):
worksheet.write_string(row, col + 2, '')
elif obj.startswith("IDG: "):
worksheet.write_string(row, col + 2, obj[10:-1])
elif obj.startswith("Password"):
worksheet.write_string(row, col + 3, obj[9:])
row += 1
workbook.close()
现在,这对于有问题的文件非常有效,但是,除了非常不理想的代码之外,我确信,我可以明确地看到需要改进的内容。在这个块中:
if line.startswith("Computer Name"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
我只想要" NO ERROR"如果我的行以"计算机名称"开头,则附加到我的列表中并且下一个非空白行不以" ERROR开头。"当然,我试过这个:
if line.startswith("Computer Name"):
if next(nonblank).startswith("ERROR"):
gold_lst.append(str(line))
elif next(nonblank).startswith("VOLUME"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
问题是,这会创建一个顶起的Excel电子表格,我根本不知道为什么。即使在主代码中的后续步骤中我打印gold_lst(只是为了检查列表是否正确),列表非常不准确。我甚至无法弄清楚列表的内容。
我该如何解决这个问题?
关于第二个问题,如果我可以在同一主题中提出这个问题,我将来可能收到的此类更常规的文本文件可能包含具有多个ID和密码的计算机。如果我不得不猜测,该块看起来像这样:
*redacted*
可能有超过2个这样的ID /密码组合。 如何修改我的代码以实现此目的?就目前而言,我的代码不会轻易解释这一点。我对Python很陌生,所以也许它可以,但我不会看到它。
答案 0 :(得分:1)
解决此问题的方法如下:
groupby()
函数将行列表拆分为基于Computer Name
行的块。脚本如下:
from itertools import groupby
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
cols = [('Name', 14), ('Error', 5), ('ID1', 38), ('Password1', 55), ('ID2', 38), ('Password2', 55), ('ID3', 38), ('Password3', 55)]
for colx, (heading, width) in enumerate(cols):
worksheet.write_string(0, colx, heading, bold)
worksheet.set_column(colx, colx, width)
rowy = 1
lines = []
data = []
computer_name = None
with open('BLRP.txt') as f_input:
lines = [line.strip() for line in f_input if len(line.strip())]
for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
if k:
computer_name = re.search(r'Computer Name:\s*(.*)\s*', list(g)[0]).group(1)
elif computer_name:
block = list(g)
error = 'NO ERROR'
ids = []
passwords = []
for line_number, line in enumerate(block):
re_error = re.match('ERROR:\s+"(.*?)"', line)
if re_error:
error = re_error.group(1)
if line.startswith('Numerical Password:'):
ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
passwords.append(block[line_number+3].strip())
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_string(rowy, 1, error)
for index, (id, pw) in enumerate(zip(ids, passwords)):
worksheet.write_string(rowy, index * 2 + 2, id)
worksheet.write_string(rowy, index * 2 + 3, pw)
rowy += 1 # Advance to the next output row
workbook.close()
假设您的BLRP.txt
如下:
Computer Name: "Name Here1"
ERROR: "some type of error"
Blah blah
Blah blah
Blah blah
Computer Name: "Name Here2"
Volume blah blah
Blah Blah
Numerical Password:
ID: {"The ID1 is here; long string of random chars"}
Password:
"Password1 here; also a long string"
Blah Blah
Blah Blah
Numerical Password:
ID: {"The ID2 is here; long string of random chars"}
Password:
"Password2 here; also a long string"
Blah Blah
Blah Blah
Numerical Password:
ID: {"The ID3 is here; long string of random chars"}
Password:
"Password3 here; also a long string"
Blah Blah
Blah Blah
您将获得如下电子表格:
groupby()
如何运作?
通常,当您遍历列表时,它会一次为您提供一个项目。使用groupby()
,您可以在" groups"中迭代此列表,其中每个组中的项目数基于条件。条件以函数的形式提供(我使用lambda
来避免编写单独的函数)。
groupby()
将构建要返回的项目组,直到函数的结果发生更改。在这种情况下,该函数正在查找以单词Computer Name
开头的行。因此,如果这是真的,它将返回一个项目(除非它们上面有两个相邻的行Computer Name
)。接下来它将返回所有不以Computer Name
开头的行,依此类推。
它返回两件事,key
和group
。 key
是函数startswith()
的结果,该函数可以是True
或False
。 group
是一个包含所有匹配项的iterable。 list(g)
用于将其转换为普通列表,在这种情况下,返回下一行Computer Name
行之前的所有行。
将条目写入不同的行并将已知的错误消息转换为数字:
from itertools import groupby
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
cols = [('Name', 14), ('Error', 5), ('ID', 38), ('Password', 55)]
for colx, (heading, width) in enumerate(cols):
worksheet.write_string(0, colx, heading, bold)
worksheet.set_column(colx, colx, width)
rowy = 1
lines = []
data = []
computer_name = None
error_numbers = {
'An error occurred while connecting to the BitLocker management interface.' : 1,
'No key protectors found.' : 2,
'An error occurred (code 0x80070057):' : 3,
'An error occurred (code 0x8004100e):' : 4}
with open('BLRP.txt') as f_input:
lines = [line.strip() for line in f_input if len(line.strip())]
for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
block = list(g)
if k:
computer_name = re.search(r'Computer Name:\s*(.*)\s*', block[0]).group(1)
elif computer_name:
error_number = 0 # 0 for NO ERROR
ids = []
passwords = []
for line_number, line in enumerate(block):
re_error = re.match('ERROR:\s+?(.*)\s*?', line)
if re_error:
error = re_error.group(1)
error_number = error_numbers.get(error, -1) # Return -1 for an unknown error
if line.startswith('Numerical Password:'):
ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
passwords.append(block[line_number+3].strip())
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_number(rowy, 1, error_number)
for id, pw in zip(ids, passwords):
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_number(rowy, 1, error_number)
worksheet.write_string(rowy, 2, id)
worksheet.write_string(rowy, 3, pw)
rowy += 1 # Advance to the next output row
if len(ids) == 0:
rowy += 1 # Advance to the next output row
workbook.close()