Question

我有一个包含如下文本的文件：

loadbalancer {
upstream application1 {
server 127.0.0.1:8082;
server 127.0.0.1:8083;
server 127.0.0.1:8084;
}
upstream application2 {
server 127.0.0.1:8092;
server 127.0.0.1:8093;
server 127.0.0.1:8094;
}
}

有谁知道，我怎么能提取如下变量：

appList=["application1","application2"]
ServerOfapp1=["127.0.0.1:8082","127.0.0.1:8083","127.0.0.1:8084"]
ServerOfapp2=["127.0.0.1:8092","127.0.0.1:8093","127.0.0.1:8094"]

。
。
。

等等

Answer 1

如果您想要的行始终以上游和服务器开头，那么这应该有效：

app_dic = {}
with open('file.txt','r') as f:
    for line in f:
        if line.startswith('upstream'):
            app_i = line.split()[1]
            server_of_app_i = []
            for line in f:
                if not line.startswith('server'):
                    break
                server_of_app_i.append(line.split()[1][:-1])
            app_dic[app_i] = server_of_app_i

app_dic应该是列表字典：

{'application1': ['127.0.0.1:8082', '127.0.0.1:8083', '127.0.0.1:8084'],
'application2': ['127.0.0.1:8092', '127.0.0.1:8093', '127.0.0.1:8094']}

编辑

如果输入文件不包含任何换行符，只要文件不是太大，您就可以将其写入列表并迭代它：

app_dic = {}
with open('file.txt','r') as f:
   txt_iter = iter(f.read().split()) #iterator of list
for word in txt_iter:
    if word == 'upstream':
        app_i = next(txt_iter)
        server_of_app_i=[]
        for word in txt_iter:
            if word == 'server':
                server_of_app_i.append(next(txt_iter)[:-1])
            elif word == '}':
                break
        app_dic[app_i] = server_of_app_i

这更难看，因为必须搜索结束的花括号才能打破。如果它变得更复杂，应该使用正则表达式。

Answer 2

如果您能够使用 Matthew Barnett 的newer regex module，则可以使用以下解决方案，请参阅additional demo on regex101.com：

import regex as re

rx = re.compile(r"""
    (?:(?P<application>application\d)\s{\n| # "application" + digit + { + newline
    (?!\A)\G\n)                             # assert that the next match starts here
    server\s                                # match "server"
    (?P<server>[\d.:]+);                    # followed by digits, . and :
    """, re.VERBOSE)

string = """
loadbalancer {
upstream application1 {
server 127.0.0.1:8082;
server 127.0.0.1:8083;
server 127.0.0.1:8084;
}
upstream application2 {
server 127.0.0.1:8092;
server 127.0.0.1:8093;
server 127.0.0.1:8094;
}
}
"""

result = {}
for match in rx.finditer(string):
    if match.group('application'):
        current = match.group('application')
        result[current] = list()
    if current:
        result[current].append(match.group('server'))

print result
# {'application2': ['127.0.0.1:8092', '127.0.0.1:8093', '127.0.0.1:8094'], 'application1': ['127.0.0.1:8082', '127.0.0.1:8083', '127.0.0.1:8084']}

这使用了\G修饰符，名为捕获组和一些编程逻辑。

Answer 3

这是基本方法：

# each of your objects here
objText = "xyz xcyz 244.233.233.2:123"
listOfAll = re.findall(r"/\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):[0-9]{1,5}/g", objText)

for eachMatch in listOfAll:
    print "Here's one!" % eachMatch

显然，边缘有点粗糙，但它会对所给出的任何字符串执行全面的正则表达式搜索。可能更好的解决方案是将对象本身传递给它，但是现在我不确定你将拥有什么作为原始输入。不过，我会尝试改进正则表达式。

Answer 4

我相信这也可以通过re来解决：

>>> import re
>>> from collections import defaultdict
>>>
>>> APP = r'\b(?P<APP>application\d+)\b'
>>> IP = r'server\s+(?P<IP>[\d\.:]+);' 
>>> 
>>> pat = re.compile('|'.join([APP, IP]))
>>> 
>>> 
>>> scan = pat.scanner(s)
>>> d = defaultdict(list)
>>> 
>>> for m in iter(scan.search, None):
        group = m.lastgroup
        if group == 'APP':
            keygroup = m.group(group)
            continue
        else:
            d[keygroup].append(m.group(group))


>>> d
defaultdict(<class 'list'>, {'application1': ['127.0.0.1:8082', '127.0.0.1:8083', '127.0.0.1:8084'], 'application2': ['127.0.0.1:8092', '127.0.0.1:8093', '127.0.0.1:8094']})

或类似于re.finditer方法且没有pat.scanner：

>>> for m in re.finditer(pat, s):
        group = m.lastgroup
        if group == 'APP':
            keygroup = m.group(group)
            continue
        else:
            d[keygroup].append(m.group(group))


>>> d
defaultdict(<class 'list'>, {'application1': ['127.0.0.1:8082', '127.0.0.1:8083', '127.0.0.1:8084'], 'application2': ['127.0.0.1:8092', '127.0.0.1:8093', '127.0.0.1:8094']})

在python中将字符串的一部分转换为变量名

4 个答案: