我有一个S19文件,如下所示:
S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8
我想分开前两个字符以及接下来的两个字符,依此类推......我希望它看起来如下所示(每行最后两个字符也要分开):
S0, 03, 0000, FC
S3, 0D, 0003C000, 0F00000000000000, 20
S3, FD, 00000000, 782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, ED, 000000F8, 3D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, 15, 00000400, FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF, 7D
S3, FD, 00000410, 10B5DFF828000468012147F22C10C4F20300016047F22010C4F20300, 00
S7, 05, 00008EB4, B8
我怎样才能在Python中执行此操作? 我有这样的事情:
#!/usr/bin/python
import string,os,sys,re,fileinput
print "hi"
inputfile = "k60.S19"
outputfile = "k60_out.S19"
# open the source file and read it
fh = file(inputfile, 'r')
subject = fh.read()
fh.close()
# create the pattern object. Note the "r". In case you're unfamiliar with Python
# this is to set the string as raw so we don't have to escape our escape characters
pattern2 = re.compile(r'S3')
pattern3 = re.compile(r'S7')
pattern1 = re.compile(r'S0')
# do the replace
result1 = pattern1.sub("S0, ", subject)
result2 = pattern2.sub("S3, ", subject)
result3 = pattern3.sub("S7, ", subject)
# write the file
f_out = file(outputfile, 'w')
f_out.write(result1)
f_out.write(result2)
f_out.write(result3)
f_out.close()
#EoF
但是我不喜欢!!有人可以帮我解决如何使用正确的正则表达式吗?
答案 0 :(得分:2)
尝试打包bincopy,也许你需要它。
bincopy - 将字符串解释为压缩二进制数据
管理传输二进制信息的各种文件格式(Motorola S-Record,Intel HEX和二进制文件)。
import bincopy
f = bincopy.BinFile()
f.add_srec_file("path/to/your/s19/flie.s19")
f.as_binary() # print s19 as binary
或者您可以轻松地使用open()作为文件:
with open("path/to/your/s19/flie.s19") as s19:
for line in s19:
type = line[0:2]
count = line[2:4]
adress = line[4:12]
data = line[12:-2]
crc = line[-2:]
print type + ", "+ count + ", " + adress + ", " + data + ", " + crc + "\n"
希望它有所帮助。
Motorola S-record file format
答案 1 :(得分:0)
您可以使用回调函数替换re.sub
:
#!/usr/bin/python
import re
data = r'''S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8'''
pattern = re.compile(r'^(..)(..)((?:.{4}){1,2})(.*)(?=..)', re.M)
def repl(m):
repstr = ''
for g in m.groups():
if (g):
repstr += g + ', '
return repstr
print re.sub(pattern, repl, data)
然而,正如Mark Setchell所注意到的那样,切片可能是一种很好的方法。
答案 2 :(得分:0)
我知道你在考虑使用Python和正则表达式,但这是为awk
做的,以下内容可能会帮助你找到使用切片的方法:
awk '{r=length($0);print substr($0,1,2),substr($0,3,2),substr($0,5,8),substr($0,13,r-14),substr($0,r-1)}' OFS=, k60.s19
那说" 获取变量r
中行的长度,然后打印前两个字符,接下来的两个字符,接下来的8个字符等等......然后使用逗号作为字段分隔符"。
EDITED
以下是一些可以帮助您入门的提示......
如果你想避免印刷第1行,你可以
awk 'FNR==1{next} ...rest of awk script above ... '
如果您只想处理超过40个字符的行,可以执行
awk 'length($0)>40 {print}' yourfile
如果您只想处理第二个字段为" xx"的行,您可以
awk '$2 ~ "xx" {print}' yourfile