我正在编写一个python脚本,以从旧文本文件(实际上是a2l文件,因此我在代码中将其命名为a2l)中提取重要信息,并将所有内容附加到新的文本文件中。这段代码可以正常工作,但是我敢肯定有更好,更简洁的方式来构造它(我对文字处理或正则表达式没有真正的经验,但是我很想学习)。
旧的文本文件(a2l)有点长,因此我将尝试解释我的代码的工作原理。基本上有两个组,GROUP1和GROUP2。我的目标是提取它们的名称属性(存储为“ val”),并根据原始名称生成一个更具可读性的名称(存储为“ name”)。接下来,我想根据数据类型或名称进行简单分类。
这是旧文本文件的示例:
/begin GROUP1
/* Name */ FirstElem_Val
/* Long Identifier */ "first input value to be configured"
/* Type */ VALUE
/* ECU Address */ 0x080136c6
/* Record Layout */ Scalar_BOOLEAN
/* Maximum Difference */ 0
/* Conversion Method */ ECU_boolean_0_0_1_0
/* Lower Limit */ 0
/* Upper Limit */ 1
SYMBOL_LINK "FirstElem_Val" 0
/end GROUP1
/begin GROUP1
/* Name */ FirstElem_Err
/* Long Identifier */ "first input error to be configured"
/* Type */ VALUE
/* ECU Address */ 0x080136c7
/* Record Layout */ Scalar_BOOLEAN
/* Maximum Difference */ 0
/* Conversion Method */ ECU_boolean_0_0_1_0
/* Lower Limit */ 0
/* Upper Limit */ 1
SYMBOL_LINK "FirstElem_Err" 0
/end GROUP1
/begin GROUP1
/* Name */ SecondElem
/* Long Identifier */ "second input to be configured"
/* Type */ VALUE
/* ECU Address */ 0x080134ec
/* Record Layout */ Scalar_FLOAT32_IEEE
/* Maximum Difference */ 0
/* Conversion Method */ ECU_single_second_0_0_1_0
/* Lower Limit */ 0
/* Upper Limit */ 10
SYMBOL_LINK "SecondElem" 0
/end GROUP1
/begin GROUP2
/* Name */ ThirdElem_Val
/* Long identifier */ ""
/* Data type */ UBYTE
/* Conversion method */ ECU_uint8_0_0_1_0
/* Resolution (Not used) */ 0
/* Accuracy (Not used) */ 0
/* Lower limit */ 0
/* Upper limit */ 255
ECU_ADDRESS 0x0801355a
SYMBOL_LINK "ThirdElem_Val" 0
/end GROUP2
/begin GROUP2
/* Name */ ThirdElem_Sta
/* Long identifier */ ""
/* Data type */ UBYTE
/* Conversion method */ ECU_SignalStatusEnum_0_0_1_0
/* Resolution (Not used) */ 0
/* Accuracy (Not used) */ 0
/* Lower limit */ 0
/* Upper limit */ 3
ECU_ADDRESS 0x08013698
SYMBOL_LINK "ThirdElem_Sta" 0
/end GROUP2
/begin GROUP2
/* Name */ FourthElem
/* Long identifier */ ""
/* Data type */ UWORD
/* Conversion method */ ECU_uint16_0_0_1_0
/* Resolution (Not used) */ 0
/* Accuracy (Not used) */ 0
/* Lower limit */ 0
/* Upper limit */ 65535
ECU_ADDRESS 0x080135a6
SYMBOL_LINK "FourthElem" 0
/end GROUP2
这是我冗长的python代码
import re
import os
def putSpace(line):
return re.sub(r"(\w)([A-Z])", r"\1 \2", line)
def generator(file):
char_count = 0 #characteristics
meas_count = 0 #measurement
with open(file, 'r+') as a2l:
for line in a2l:
generated = open('new.txt','a+')
if line.find('/begin GROUP1')>=0:
char_count += 1
name_line = next(a2l, '').strip()
val = name_line.replace('/* Name */ ','')
name = name_line.replace('/* Name */ ','')
name = putSpace(name)
# a bunch of replacements to make it readable
name = name.replace('_','')
name = name.replace('Elem','Element')
name = name.replace('Val','Value')
name = name.replace('Err','Error')
for i in range(4):
char_type = next(a2l, '').strip() #record layout
if 'FLOAT32' in char_type:
generated_out = 'text: \'%s\' \'%s\'' % (name,val)
generated.write(str(generated_out)+'\n')
elif 'Scalar_BOOLEAN' in char_type:
#first element has two types
if 'Val' in val:
generated_out = 'bool-HL \'%s\' \'%s\'' % (name,val)
generated.write(str(generated_out)+'\n')
elif 'Err' in val:
generated_out = 'bool-err \'%s\' \'%s\'' % (name,val)
generated.write(str(generated_out)+'\n')
elif line.find('/begin GROUP2')>=0:
meas_count += 1
name_line = next(a2l, '').strip()
val = name_line.replace('/* Name */ ','')
name = name_line.replace('/* Name */ ','')
name = putSpace(name)
# a bunch of replacements to make it readable
name = name.replace('_','')
name = name.replace('Elem','Element')
name = name.replace('Val','Value')
name = name.replace('Sta','Status')
for i in range(3):
meas_type = next(a2l, '').strip()
if 'uint' in meas_type:
generated_out = 'text: \'%s\' \'%s\'' % (name,val)
generated.write(str(generated_out)+'\n')
elif 'Enum' in meas_type:
generated_out = 'enum: \'%s\' \'%s\'' % (name,val)
generated.write(str(generated_out)+'\n')
print('group1: ',char_count,' group2: ',meas_count)
path = r'C:[file directory]'
dirs = os.listdir( path )
for file in sorted(dirs):
#print(file)
#if file==""
try:
generator(file)
except:
print('not found')
这是我不应更改的输出。
bool-HL 'First Element Value' 'FirstElem_Val'
bool-err 'First Element Error' 'FirstElem_Err'
text: 'Second Element' 'SecondElem'
text: 'Third Element Value' 'ThirdElem_Val'
enum: 'Third Element Status' 'ThirdElem_Sta'
text: 'Fourth Element' 'FourthElem'
答案 0 :(得分:3)
复杂的文本处理看起来总是近似于您的代码。根据您的要求提供一些建议。
使用前缀“ f”的字符串代替“%”进行格式化。更清晰,更安全。例如,
generated_out = 'bool-HL \'%s\' \'%s\'' % (name,val)` # not one of the best way
generated_out = f"bool-HL '{name}' '{val}'"` # the same, but a little better
我使用双引号将整个字符串括起来,以消除在字符串中的单引号之前使用反斜杠的必要性(顺便说一下,“ f字符串”中禁止使用它们)。然后,我使用“ f”前缀根据范围设置字符串“就位”格式(它会自动从当前范围中检索name
和val
。
以下是文档:https://docs.python.org/3/reference/lexical_analysis.html#f-strings
这是一篇不错的文章:https://realpython.com/python-f-strings/
然后,为清楚起见,我将编写两个三个局部微函数(例如您的putSpace
),例如:
def store():
generated.write(generated_out+'\n') # “str()-conversion” doesn’t need because “generated_out” is really a string
def change(source, changes):
result = source
for old, new in changes.items():
result = result.replace(old, new)
return result
# and somewhere in the code, instead of…
name = name.replace('_','')
name = name.replace('Elem','Element')
name = name.replace('Val','Value')
name = name.replace('Sta','Status')
# …you can write something like this
name = change(name, {
'_': '',
'Elem': 'Element',
'Val': 'Value',
'Sta': 'Status',
})
我会给变量起一个简短的名字,例如,“ output”代表“ generated_out”。
祝你好运,随时问问题!