Question

早上好！

我为.txt文件编写的代码遵循具有开始/完成时间的模式。当我试图看它是否适用于不遵循该模式的不同.txt文件时......它（显然）破坏了。工作时的输出如下。

import pprint  # Fancy pretty print for python
import re  # regular expressions
 
count = 0
d = {}  # d is an empty dictionary
 
file = open(r"C:\Users\cqt7wny\Desktop\test.txt", "r")  # Open file for reading, it returns the contents of file as array (its a generator)
 
for line in file:  # Read line by line
  if '==' in line or "**" in line or not line.strip() or 'countriesshipped by day' in line:  # If line is long string of =, its a record separator, skip it
      continue
 
  if 'STARTED' in line:  # This line contains start time
      program_name, _ = line.split("STARTED")  # The pattern is <program name><space>STARTED<WHATEVER>
      start_time = line.split(' ')[-1].strip()  # Slplit line wit a space and take last component
      d[count] = ({'start_time': start_time})  # Initialize the nth record, starts with 0 as 'count' is set to 0
 
      continue
 
  if 'COMPLETED' in line:  # End time
      end_time = line.split(' ')[-1].strip()
      d[count].update({'end_time': end_time})  # Get end time
      count += 1
      continue
 
  # For every other line with = in it,  split with = to make it key/value
 
  try:
      x, y = re.split(r'\=|\:', line)
  except:
      x, y = ("", "")
      print (line)
 
  x = x.strip()  # Remove leading and trailing spaces on key
  y = y.strip()  # Remove leading and trailing spaces on value
 
  d[count].update({x: y})  # Put the key value pair into d[count]
 
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(d)

输出

{   0: {   'ADDR FOUND': '3169',
          'ADDR NOT FND': '0',
          'CALLS': '82',
          'ELIG   SYS': '3762',
          'INELIG SYS': '7',
          'Program Name': 'program1',
          'REC READ': '265',
          'REC WRITTEN': '265',
          'SHPR FOUND': '69',
          'SHPR NOT FND': '3',
          'end_time': '2017-06-07-14.35.56.067879',
          'start_time': '2017-06-07-14.31.34.827086'},
   1: {   'ADDR FOUND': '31369',
          'ADDR NOT FND': '10',
          'CALLS': '32',
          'ELIG   SYS': '762',
          'INELIG SYS': '471',
          'Program Name': 'program1',
          'REC READ': '165',
          'REC WRITTEN': '235',
          'SHPR FOUND': '649',
          'SHPR NOT FND': '23',
          'end_time': '2017-06-07-14.35.56.067879',
          'start_time': '2017-06-07-14.31.34.827086'},
   2: {   'ADDR FOUND': '3169',
          'ADDR NOT FND': '0',
          'CALLS': '82',
          'ELIG   SYS': '3762',
          'INELIG SYS': '7',
          'Program Name': 'program1',
          'REC READ': '265',
          'REC WRITTEN': '265',
          'SHPR FOUND': '69',
          'SHPR NOT FND': '3',
          'end_time': '2017-06-07-14.35.56.067879',
          'start_time': '2017-06-07-14.31.34.827086'},
   3: {   'ADDR FOUND': '31369',
          'ADDR NOT FND': '10',
          'CALLS': '32',
          'ELIG   SYS': '762',
          'INELIG SYS': '471',
          'Program Name': 'program1',
          'REC READ': '165',
          'REC WRITTEN': '235',
          'SHPR FOUND': '649',
          'SHPR NOT FND': '23',
          'end_time': '2017-06-07-14.35.56.067879',
          'start_time': '2017-06-07-14.31.34.827086'},

我想要完成的事情：我的目标是制作一个解析器程序，无论格式如何，都可以扫描任何.txt文件，并检索特定的用户定义信息。

我的计划/想法？

为了使该程序适用于任何文本文件，用户需要知道他们希望程序扫描的信息的每个细节。换句话说，用户告诉程序它需要搜索什么......程序没有做出任何假设。

我希望运行程序的用户为1.输入文件名，2。输入程序名称（用作开始搜索）3。输入分隔符（对于文件中的键值对）4。键用户需要的值（程序将通过行查看键是否与行匹配，然后取右边的值）。因此，该程序涉及的步骤很少。

GET 一个。文件名湾程序名称 C。分隔符 d。来自用户的密钥列表
打开文件＆amp;阅读 3.浏览每一行寻找键值对
用分隔符
打印键和值

我目前的代码：

file_name = input("File name : ")
program_name = input("Program name : ")
delimiter = input("Delimiter : ")
 
fields = input("Fields : ")
field_list = fields.split(",")
 
d = []  # d is an empty array
 
file = open(file_name, "r")  # Open file for reading, it returns the contents of file as array (its a generator)
 
for line in file:  # Read line by line
   if any(field in line for field in field_list):
       key, value = line.split(delimiter)
       d.append({key: value}) # Put the key value pair into d[count]
 
print(d)

用用户定义的输入解析大的.txt - Python

0 个答案: