我为.txt文件编写的代码遵循具有开始/完成时间的模式。当我试图看它是否适用于不遵循该模式的不同.txt文件时......它(显然)破坏了。工作时的输出如下。
import pprint # Fancy pretty print for python
import re # regular expressions
count = 0
d = {} # d is an empty dictionary
file = open(r"C:\Users\cqt7wny\Desktop\test.txt", "r") # Open file for reading, it returns the contents of file as array (its a generator)
for line in file: # Read line by line
if '==' in line or "**" in line or not line.strip() or 'countriesshipped by day' in line: # If line is long string of =, its a record separator, skip it
continue
if 'STARTED' in line: # This line contains start time
program_name, _ = line.split("STARTED") # The pattern is <program name><space>STARTED<WHATEVER>
start_time = line.split(' ')[-1].strip() # Slplit line wit a space and take last component
d[count] = ({'start_time': start_time}) # Initialize the nth record, starts with 0 as 'count' is set to 0
continue
if 'COMPLETED' in line: # End time
end_time = line.split(' ')[-1].strip()
d[count].update({'end_time': end_time}) # Get end time
count += 1
continue
# For every other line with = in it, split with = to make it key/value
try:
x, y = re.split(r'\=|\:', line)
except:
x, y = ("", "")
print (line)
x = x.strip() # Remove leading and trailing spaces on key
y = y.strip() # Remove leading and trailing spaces on value
d[count].update({x: y}) # Put the key value pair into d[count]
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(d)
输出
{ 0: { 'ADDR FOUND': '3169',
'ADDR NOT FND': '0',
'CALLS': '82',
'ELIG SYS': '3762',
'INELIG SYS': '7',
'Program Name': 'program1',
'REC READ': '265',
'REC WRITTEN': '265',
'SHPR FOUND': '69',
'SHPR NOT FND': '3',
'end_time': '2017-06-07-14.35.56.067879',
'start_time': '2017-06-07-14.31.34.827086'},
1: { 'ADDR FOUND': '31369',
'ADDR NOT FND': '10',
'CALLS': '32',
'ELIG SYS': '762',
'INELIG SYS': '471',
'Program Name': 'program1',
'REC READ': '165',
'REC WRITTEN': '235',
'SHPR FOUND': '649',
'SHPR NOT FND': '23',
'end_time': '2017-06-07-14.35.56.067879',
'start_time': '2017-06-07-14.31.34.827086'},
2: { 'ADDR FOUND': '3169',
'ADDR NOT FND': '0',
'CALLS': '82',
'ELIG SYS': '3762',
'INELIG SYS': '7',
'Program Name': 'program1',
'REC READ': '265',
'REC WRITTEN': '265',
'SHPR FOUND': '69',
'SHPR NOT FND': '3',
'end_time': '2017-06-07-14.35.56.067879',
'start_time': '2017-06-07-14.31.34.827086'},
3: { 'ADDR FOUND': '31369',
'ADDR NOT FND': '10',
'CALLS': '32',
'ELIG SYS': '762',
'INELIG SYS': '471',
'Program Name': 'program1',
'REC READ': '165',
'REC WRITTEN': '235',
'SHPR FOUND': '649',
'SHPR NOT FND': '23',
'end_time': '2017-06-07-14.35.56.067879',
'start_time': '2017-06-07-14.31.34.827086'},
我想要完成的事情: 我的目标是制作一个解析器程序,无论格式如何,都可以扫描任何.txt文件,并检索特定的用户定义信息。
我的计划/想法?
为了使该程序适用于任何文本文件,用户需要知道他们希望程序扫描的信息的每个细节。换句话说,用户告诉程序它需要搜索什么......程序没有做出任何假设。
我希望运行程序的用户为1.输入文件名,2。输入程序名称(用作开始搜索)3。输入分隔符(对于文件中的键值对)4。键用户需要的值(程序将通过行查看键是否与行匹配,然后取右边的值)。 因此,该程序涉及的步骤很少。
我目前的代码:
file_name = input("File name : ")
program_name = input("Program name : ")
delimiter = input("Delimiter : ")
fields = input("Fields : ")
field_list = fields.split(",")
d = [] # d is an empty array
file = open(file_name, "r") # Open file for reading, it returns the contents of file as array (its a generator)
for line in file: # Read line by line
if any(field in line for field in field_list):
key, value = line.split(delimiter)
d.append({key: value}) # Put the key value pair into d[count]
print(d)