我正试图从一个非常复杂的字符串中获取大量的值 -
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
这些是我需要扫描的值 -
list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
我的目的是在每个字符串后面得到3个数字,所以在HighPriority
的示例中,我会得到[0, 74, 74]
,然后我可以对每个项目做一些事情。
我已经使用了以下内容,但它没有说明字符串末尾何时不是逗号。
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
for l in list:
print l
print find_between( s, l + ':', ',' ).split(':')
答案 0 :(得分:2)
编辑,如果 真的 想要避免使用正则表达式,那么您的方法可以进行微调(我将list
重命名为l
以避免遮蔽内置类型):
from itertools import takewhile
from string import digits
def find_between(s, first):
try:
start = s.index(first) + len(first)
# Keep taking the next character while it's either a ':' or a digit
# You can also just cast this into a list and forget about joining and later splitting.
# Also, consider storing ':'+digits in a variable to avoid recreating it all the time
return ''.join(takewhile(lambda char: char in ':'+digits, s[start:]))
except ValueError:
return ""
for _ in l:
print _
print find_between(s, _ + ':').split(':')
打印:
Compiler
['0', '0', '0']
HighPriority
['0', '74', '74']
Default
['6', '1872', '1874']
LowPriority
['0', '2', '2']
Special
['0', '2', '2']
Event
['0', '0', '0']
CommHigh
['0', '1134', '1152']
CommDefault
['0', '4', '4']
但是,这确实是正则表达式的一项任务,您应该尝试了解基础知识。
import re
def find_between(s, word):
# Search for your (word followed by ((:a_digit) repeated three times))
x = re.search("(%s(:\d+){3})" % word, s)
return x.groups()[0]
for word in l:
print find_between(s, word).split(':', 1)[-1].split(':')
打印
['0', '0', '0']
['0', '74', '74']
['6', '1872', '1874']
['0', '2', '2']
['0', '2', '2']
['0', '0', '0']
['0', '1134', '1152']
['0', '4', '4']
答案 1 :(得分:0)
如果字符串始终格式良好,这将为您提供所有组:
re.findall('(\w+):(\d+):(\d+):(\d+)', s)
它也有时间,您可以轻松地从列表中删除。
或者您可以使用词典理解来组织项目:
matches = re.findall('(\w+):(\d+:\d+:\d+)', s)
my_dict = {k : v.split(':') for k, v in matches[1:]}
我在这里使用matches[1:]
来消除虚假匹配。如果你知道它会一直存在,你可以这样做。
答案 2 :(得分:0)
检查一下:
import re
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
data = []
for x in search:
data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s))
data = [map(lambda x: x.split(':'), x) for x in data] # remove :
data = [x[0] for x in data] # remove unnecessary []
data = [map(int,x) for x in data] # convert to int
print data
>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]