Question

我正试图从一个非常复杂的字符串中获取大量的值 -

s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'

这些是我需要扫描的值 -

list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']

我的目的是在每个字符串后面得到3个数字，所以在HighPriority的示例中，我会得到[0, 74, 74]，然后我可以对每个项目做一些事情。

我已经使用了以下内容，但它没有说明字符串末尾何时不是逗号。

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""


for l in list:
    print l
    print find_between( s, l + ':', ',' ).split(':')

Answer 1

编辑，如果真的想要避免使用正则表达式，那么您的方法可以进行微调（我将list重命名为l以避免遮蔽内置类型）：

from itertools import takewhile
from string import digits

def find_between(s, first):
    try:
        start = s.index(first) + len(first)
        # Keep taking the next character while it's either a ':' or a digit
        # You can also just cast this into a list and forget about joining and later splitting.
        # Also, consider storing ':'+digits in a variable to avoid recreating it all the time
        return ''.join(takewhile(lambda char: char in ':'+digits, s[start:]))
    except ValueError:
        return ""


for _ in l:
    print _
    print find_between(s, _ + ':').split(':')

打印：

Compiler
['0', '0', '0']
HighPriority
['0', '74', '74']
Default
['6', '1872', '1874']
LowPriority
['0', '2', '2']
Special
['0', '2', '2']
Event
['0', '0', '0']
CommHigh
['0', '1134', '1152']
CommDefault
['0', '4', '4']

但是，这确实是正则表达式的一项任务，您应该尝试了解基础知识。

import re

def find_between(s, word):
    # Search for your (word followed by ((:a_digit) repeated three times))
    x = re.search("(%s(:\d+){3})" % word, s)
    return x.groups()[0]

for word in l:
    print find_between(s, word).split(':', 1)[-1].split(':')

打印

['0', '0', '0']
['0', '74', '74']
['6', '1872', '1874']
['0', '2', '2']
['0', '2', '2']
['0', '0', '0']
['0', '1134', '1152']
['0', '4', '4']

Answer 2

如果字符串始终格式良好，这将为您提供所有组：

re.findall('(\w+):(\d+):(\d+):(\d+)', s)

它也有时间，您可以轻松地从列表中删除。

或者您可以使用词典理解来组织项目：

matches = re.findall('(\w+):(\d+:\d+:\d+)', s)
my_dict = {k : v.split(':') for k, v in matches[1:]}

我在这里使用matches[1:]来消除虚假匹配。如果你知道它会一直存在，你可以这样做。

Answer 3

检查一下：

import re
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
data = []
for x in search:
    data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s))

data = [map(lambda x: x.split(':'), x) for x in data] # remove :
data = [x[0] for x in data] # remove unnecessary []
data = [map(int,x) for x in data] # convert to int
print data

>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]

python - 循环遍历字符串

3 个答案: