我有一个文本文件,在某些区域包含以下字符串。
20170818_141903 Test ! Vdd 3.000000; P: 20.000000;T 20.282000;Part: 0; Baud Rate: 9620.009620; Message: MMS111111110001110100000000000100100000000000000000000000000100010000000000000000000001000000000010000000000001000000100000000010000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000011001001001110001010001000000000111011011001010110000000000000010000001101100000000000000000000011011111010000100111101000000000111111110000111110010110000000010001001101110000101000000000000110010010000000000000000000000000000000000001000000000000000001000000000010000001000000000000000000000000000000000000000000100010000000000000101010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010100101111111010111000000110100000000101000110000100010101010011010000000000000100010001100000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000SS
不幸的是,它不是逗号或制表符分隔,每行都是一个大字符串。
我已阅读整个文件,并试图提取所有二进制数据。
这意味着我希望以下字符之间的所有内容
MMS ...... SS
我还想提取例如来自这些区域的P:或Vdd:之后的值
Vdd 3.000000; P: 20.000000...........................etc
我目前所做的事情:
import re
match = re.search(r'\P: (\w+)', LONG_STRING)
if match:
print match.group(1)
然而,这并没有提取完整的浮点数,它会忽略小数位
答案 0 :(得分:1)
回答v2.0。总的来说,这段代码非常僵硬而且不是最清晰的代码,但目前我无法为您提供的样本提供更好的解决方案。
>>> import re
>>> that_long_row = "20170818_141903 Test ! Vdd 3.000$000; P: 20.000000;T 20.282000;Part: 0; Baud Rate: 9620.009620; Message: MMS111111110001110100000000000100100000000000000000000000000100010000000000000000000001000000000010000000000001000000100000000010000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000011001001001110001010001000000000111011011001010110000000000000010000001101100000000000000000000011011111010000100111101000000000111111110000111110010110000000010001001101110000101000000000000110010010000000000000000000000000000000000001000000000000000001000000000010000001000000000000000000000000000000000000000000100010000000000000101010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010100101111111010111000000110100000000101000110000100010101010011010000000000000100010001100000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000SS "
>>> regex = (r'^' # start of a string symbol
r'.+' # escape any character
r'Vdd ' # until "Vdd " is reached
r'(?P<Vdd>[0-9\.]+)' # select a continuous sequence of numbers and dots folowing that word and assign it to a group "Vdd"
r'.+' # again, skip some more chars
r'P: ' # find "P: " word
r'(?P<P>[0-9\.]+)' # select a continuous sequence of numbers and dots and assign to a group "P"
r'.+' # the same goes for your byte "Message" between "MMS" and "SS" symbols
r'MMS'
r'(?P<Message>[0-1]+)' # except that it only matches 0 and 1
r'SS'
r'.+' # as @Evan mentioned, you need this to escape some possible trailing symbols
r'$' # end of a string symbol
)
# the same but in a compact form:
>>> regex = r'^.+Vdd (?P<Vdd>[0-9\.]+).+P: (?P<P>[0-9\.]+).+MMS(?P<Message>[0-1]+)SS.+$'
>>> match = re.match(regex, that_long_row)
# matching will form a groupdict that is like a normal dict
# and you can access any matched group value by its name
>>> match.groupdict()
{'Vdd': '3.000', 'P': '20.000000', 'Message': ...
接下来,如果你想以这种方式解析文件,我会创建一个简单的类来保存所有数据,类型转换,验证等。
class Message:
def __init__(self, Vdd, P, Message):
self.vdd = float(Vdd)
self.p = float(P)
self.text = Message
data = []
with open('yourfile', 'r') as f:
for line in f:
match = re.match(regex, line)
try:
data.append(Message(**match.groupdict()))
except ValueError:
data.append('CORRUPTED')
等等。