我有一个文本文档要解析。我希望能够获取“ @ 5c00 \ n”和“ @ ffd2 \ n”之间以及“ @ ffd2 \ n”和“ @”之间的字符串
@5c00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02
@ffd2
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C
@
q
我尝试使用正则表达式,但这似乎给了我['','']。
file = open("app_blink.txt","r") #app_blink.txt being the string above
contents = file.read()
data = re.findall('\n(.*)@',contents,re.M)
我希望得到:
data
['81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00..
FD 3F 03 43 00 00 00 02','14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C..
\n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14..
5C 14 5C 14 5C 00 5C CF 0C \n']
但实际上得到了:
data
['','']
答案 0 :(得分:1)
你很近。您需要使用594 الدمام 7
23 Киев 46806
17 Atlanta 65969
18 СПБ 64731
608 المملكة الأردنية الهاشمية 2
标志和一个非贪婪匹配项:
re.DOTALL
输出:
contents = '''\
@5c00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02
@ffd2
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C
@
q
'''
import re
for x in re.findall(r'\n(.*?)@',contents,re.DOTALL):
print(x)
答案 1 :(得分:0)
这听起来像是正则表达式的工作!
\@[^\n]*\n([^\@]*)\n(?=\@)
此正则表达式将匹配:
@
符号@
的内容:这部分都保存在#1组中@
时接受(但不要使用该字符)例如:
>>> re.search(r'\@[^\n]*\n([^\@]*)\n(?=\@)', your_string).group(1)
'81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 '
因此,要获取重要内容的列表:
>>> [m.group(1) for m in re.finditer(r'\@[^\n]*\n([^\@]*)\n(?=\@)', your_string)]
['81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 ', '14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C ']
或者,一个简单的答案:
re.split(r'\@[^\n]*\n', your_string)
只要找到以@
开头的行,就分割字符串。
答案 2 :(得分:0)
检查此正则表达式:
data = re.findall('^[\d \w]{2,}$',contents,re.M)
它只是采用具有十六进制数字的行。
答案 3 :(得分:0)
此正则表达式应该工作Tryit
import re
regex = r"^[^\@].*"
test_str = ("@5c00\n81 00 00\n76 20 11\n@ffd2\n")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
注意:为了与Python 2.7兼容,请使用ur“”作为正则表达式的前缀,而u“”为测试字符串和替换的前缀。
答案 4 :(得分:0)
在这里,我们可能不想使用正则表达式,因为它可能会变得有点昂贵。也许字符串拆分会很好。例如,我们可以除以@
。
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
test_str = '''
@bb00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02
@5c00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02
@ffd2
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C
@
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02
'''
split_str = test_str.split('@')
data=[]
for matches in split_str:
if (matches[:4] == '5c00' or matches[:4] == 'ffd2'):
data.append(matches[5:])
print(data)
['81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \ nB1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 02 \ n','14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \ n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \ n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C \ n']