将字符串解压缩为扩展字符串

时间:2016-08-19 04:15:44

标签: python

我收到了以下格式的字符串:"a{1;4:6}""a{1;2}b{2:4}"其中;代表两个不同的数字,而:代表一系列数字。支架内可以有任意数量的分号和冒号组合。

我想扩展它,以便扩展上面两个例子的结果:

  • "a{1;4:6}" =“a1a4a5a6”
  • "a{1;2}b{2:4}" = "a1b2b3b4a2b2b3b4"

我以前从来没有处理过这样的事情,因为我通常会以某种现成的格式给出字符串,这种格式很容易解析。在这种情况下,我必须手动解析字符串。

我的尝试是一遍又一遍地手动拆分字符串,直到遇到冒号或分号的情况,然后从那里开始构建字符串。这非常低效,我希望对这种方法有任何想法。这基本上是代码的样子(我省略了很多,只是为了更快地得到点):

>>> s = "a{1;4:6}"
>>> splitted = s.split("}")
>>> splitted
['a{1;4:6', '']
>>> splitted2 = [s.split("{") for s in splitted]
>>> splitted2
[['a', '1;4:6'], ['']]
>>> splitted3 = [s.split(";") for s in splitted2[0]]
>>> splitted3
[['a'], ['1', '4:6']]

# ... etc, then build up the strings manually once the ranges are figured out.

首先在闭合支撑处分裂的想法是保证在它之后出现具有相关范围的新标识符。我哪里错了?我的方法适用于简单的字符串,例如第一个示例,但它不适用于第二个示例。此外,效率低下。我对此问题的任何意见表示感谢。

3 个答案:

答案 0 :(得分:7)

我尝试了pyparsing,恕我直言,它产生了一个非常易读的代码(从上一个答案中获取了pack_tokens)。

from pyparsing import nums, Literal, Word, oneOf, Optional, OneOrMore, Group, delimitedList
from string import ascii_lowercase as letters

# transform a '123' to 123
number = Word(nums).setParseAction(lambda s, l, t: int(t[0]))

# parses 234:543 ranges
range_ =  number + Literal(':').suppress() + number

# transforms the range x:y to a list [x, x+1, ..., y]
range_.setParseAction(lambda s, l, t: list(range(t[0], t[1]+1)))

# parse the comma delimited list of ranges or individual numbers
range_list = delimitedList(range_|number,",")

# and pack them in a tuple
range_list.setParseAction(lambda s, l, t: tuple(t))

# parses 'a{2,3,4:5}' group
group = Word(letters, max=1) + Literal('{').suppress() + range_list + Literal('}').suppress()

# transform the group parsed as ['a', [2, 4, 5]] to ['a2', 'a4' ...]
group.setParseAction(lambda s, l, t: tuple("%s%d" % (t[0],num) for num in t[1]))

# the full expression is just those group one after another
expression = OneOrMore(group)

def pack_tokens(s, l, tokens):
    current, *rest = tokens
    if not rest:
        return ''.join(current)  # base case
    return ''.join(token + pack_tokens(s, l, rest) for token in current)

expression.setParseAction(pack_tokens)


parsed = expression.parseString('a{1,2,3}')[0]
print(parsed)
parsed = expression.parseString('a{1,3:7}b{1:5}')[0]
print(parsed)

答案 1 :(得分:4)

import re

def expand(compressed):

    # 'b{2:4}' -> 'b{2;3;4}' i.e. reduce the problem to just one syntax
    normalized = re.sub(r'(\d+):(\d+)', lambda m: ';'.join(map(str, range(int(m.group(1)), int(m.group(2)) + 1))), compressed)

    # 'a{1;2}b{2;3;4}' -> ['a{1;2}', 'b{2;3;4}']
    elements = re.findall(r'[a-z]\{[\d;]+\}', normalized)

    tokens = []

    # ['a{1;2}', 'b{2;3;4}'] -> [['a1', 'a2'], ['b2', 'b3', 'b4']]
    for element in elements:
        match = re.match(r'([a-z])\{([\d;]+)\}', element)

        alphanumerics = []  # match result already guaranteed by re.findall()

        for number in match.group(2).split(';'):
            alphanumerics.append(match.group(1) + number)

        tokens.append(alphanumerics)

    # [['a1', 'a2'], ['b2', 'b3', 'b4']] -> 'a1b2b3b4a2b2b3b4'
    def pack_tokens(tokens):

        current, *rest = tokens

        if not rest:
            return ''.join(current)  # base case

        return ''.join(token + pack_tokens(rest) for token in current)

    return pack_tokens(tokens)

strings = ['a{1;4:6}', 'a{1;2}b{2:4}', 'a{1;2}b{2:4}c{3;6}']

for string in strings:
    print(string, '->', expand(string))

<强>输出

a{1;4:6} -> a1a4a5a6
a{1;2}b{2:4} -> a1b2b3b4a2b2b3b4
a{1;2}b{2:4}c{3;6} -> a1b2c3c6b3c3c6b4c3c6a2b2c3c6b3c3c6b4c3c6

答案 2 :(得分:2)

只是为了演示使用eval执行此操作的技巧(如评论中提到的@ialcuaz)。我再也不建议这样做,其他答案更合适。当你不想要一个完整的解析器时,当结构更复杂(即用括号递归等)时,这种技术会很有用。

import re
import functools

class Group(object):
    def __init__(self, prefix, items):
        self.groups = [[prefix + str(x) for x in items]]

    def __add__(self, other):
        self.groups.extend(other.groups)
        return self

    def __repr__(self):
        return self.pack_tokens(self.groups)

    # adapted for Python 2.7 from @cdlane's code
    def pack_tokens(self, tokens):
        current = tokens[:1][0]
        rest = tokens[1:]
        if not rest:
            return ''.join(current)
        return ''.join(token + self.pack_tokens(rest) for token in current)

def createGroup(str, *items):
    return Group(str, items)

def expand(compressed):

    # Replace a{...}b{...} with a{...} + b{...} as we will overload the '+' operator to help during the evaluation
    expr = re.sub(r'(\}\w+\{)', lambda m: '} + ' + m.group(1)[1:-1] + '{', compressed)

    # Expand : range to explicit list of items (from @cdlane's answer)
    expr = re.sub(r'(\d+):(\d+)', lambda m: ';'.join(map(str, range(int(m.group(1)), int(m.group(2)) + 1))), expr)

    # Convert a{x;y;..} to a(x,y, ...) so that it evaluates as a function
    expr = expr.replace('{', '(').replace('}', ')').replace(";", ",")

    # Extract the group prefixes ('a', 'b', ...)
    groupPrefixes = re.findall(ur'(\w+)\([\d,]+\)', expr)

    # Build a namespace mapping functions 'a', 'b', ... to createGroup() capturing the groupName prefix in the closure
    ns = {prefix: functools.partial(createGroup, prefix) for prefix in groupPrefixes}

    # Evaluate the expression using the namespace
    return eval(expr, ns)

tests = ['a{1;4:6}', 'a{1;2}b{2:4}', 'a{1;2}b{2:4}c{3;6}']
for test in tests:
    print(test, '->', expand(test))

产地:

('a{1;4:6}', '->', a1a4a5a6)
('a{1;2}b{2:4}', '->', a1b2b3b4a2b2b3b4)
('a{1;2}b{2:4}c{3;6}', '->', a1b2c3c6b3c3c6b4c3c6a2b2c3c6b3c3c6b4c3c6)