Question

我正在尝试制作一个简单的程序，以帮助为流行的桌面战争游戏列出军队清单。作为我自己的锻炼，我可以做很多事情，因为有很多预制的软件包可以执行此操作，但是背后的想法似乎很简单

程序从电子表格中读取军队中所有可用单位的数据，并为每个单位创建各种类别。我现在要看的主要内容是选件/升级。

在文件中，我希望每个单元的option字段都具有简单的语法。即以下选项字符串itemA, itemB/itemC-3, 2*itemD, itemE/itemF/itemG, itemH/itemI+itemJ表示

    1. you may take itemA (X pts per model)
    2. for every 3 models, you may exchange itemB with 
         a) itemC (net X pts per model)
    3. each model may take 2 of itemD (X pts per model)
    4. each model may take one of either 
         a)itemE (X pts per model)
         b)itemF (X pts per model)
         c)itemG (X pts per model
    5. each model may take either 
         a)itemH (X points per model)
         b)itemI and itemJ (X points per model)

此刻，我正在使用大量的分割和if语句处理字符串，这使得在用户输入选择后很难跟踪并正确分配。

    for index, option in enumerate(self.options):
        output = "{}.".format(index+1)
        if '-' in option:
            sub_option, no_models = option.split('-')
            no_models = int(no_models)
            print(sub_option)
            print(no_models)
            output += "For every {} models ".format(no_models)
            if '/' in sub_option:
                temp_str, temp_options, points_list = exchange_option(sub_option)

            else:
                temp_str, temp_options, points_list = standard_option(sub_option)

            index_points.append(points_list)
            temp_options.append(no_models)
            index_options.append(temp_options)

        else:
            if '/' in option:
                temp_str, temp_options, points_list = exchange_option(option)
            else:
                temp_str, temp_options, points_list = standard_option(option)

            index_points.append(points_list)
            index_options.append(temp_options)

        output += temp_str

*_option()函数是我上面定义的附加帮助器函数，它们具有类似的结构，其中进一步包含if语句。

我要问的主要问题是，有没有更简单的方法来处理像这样的字符串这样的代码？在上面的示例中，虽然它可以产生输出，但是处理用户输入似乎非常麻烦。

我想要做的是首先输出问题顶部的示例中给出的字符串，然后获取给定选项的用户输入索引，修改关联的单位类以具有正确的变形和点数值。

我考虑过尝试制作某种选项类，但是再次标记和定义每个选项，以便它们可以正确地相互交互似乎同样复杂，而且我认为必须有一些更Python的东西或只是更好的编码实践处理诸如此类的编码字符串？

Answer 1

因此，这是一个功能强大的解析器！现在，它仅输出问题的上一版本中的列表，但是添加所需的功能并不难。还请注意，目前，当字符串包含无效标记时，词法分析器不会出错，但这只是概念证明，因此应该没事。

第一部分：词法分析器

这将标记输入字符串-从左到右浏览它，并尝试将不重叠的子字符串分类为标记实例。在解析之前要使用它。给定字符串时，Lexer.tokenize产生Token s个流。

# FILE: lex.py

import re
import enum

class Token:
    def __init__(self, type, value: str, lineno: int, pos: int):
        self.type, self.value, self.lineno, self.pos = type, value, lineno, pos

    def __str__(self):
        v = f'({self.value!r})' if self.value else ''

        return f'{self.type.name}{v} at {self.lineno}:{self.pos}'

    __repr__ = __str__


class Lexer:
    def __init__(self, token_types: enum.Enum, tokens_regexes: dict):
        self.token_types = token_types

        regex = '|'.join(map('(?P<{}>{})'.format, *zip(*((tok.name, regex) for tok, regex in tokens_regexes.items()))))
        self.regex = re.compile(regex)


    def tokenize(self, string, skip=['space']):
        # TODO: detect invalid input

        lineno, pos = 0, 0
        skip = set(map(self.token_types.__getitem__, skip))

        for matchobj in self.regex.finditer(string):
            type_name = matchobj.lastgroup
            value = matchobj.groupdict()[type_name]

            Type = self.token_types[type_name]

            if Type == self.token_types.newline: # possibly buggy, but not catastrophic
                self.lineno += 1
                self.pos = 0
                continue

            pos = matchobj.end()

            if Type not in skip:
                yield Token(Type, value, lineno, pos)   

        yield Token(self.token_types.EOF, '', lineno, pos)

第二部分：解析器（具有语法驱动的评估）：

这将解析lex.Lexer.tokenize提供的令牌的给定流，并根据以下语法将各个符号翻译成英语：

Opt_list -> Option Opt_list_
Opt_list_ -> comma Option Opt_list_ | empty
Option -> Choice | Mult
Choice -> Compound More_choices Exchange
Compound -> item Add_item
Add_item -> plus item Add_item | empty
More_choices -> slash Compound More_choices | empty
Exchange -> minus num | empty
Mult -> num star Compound

大写字母符号是非终结符，小写字母符号是终结符。还有一个特殊符号EOF不在此处。

此外，请查看此语法的vital statistics。这个语法是LL（1），因此我们可以使用LL（1）递归下降预测解析器，如下所示。

如果您修改语法，则应相应地修改解析器！进行实际解析的方法称为parse_<something>，要更改解析器的输出（实际上是Parser.parse函数），应更改这些parse_<something>函数的返回值。 / p>

# FILE: parse.py

import lex

class Parser:

    def __init__(self, lexer):
        self.string, self.tokens = None, None
        self.lexer = lexer
        self.t = self.lexer.token_types

        self.__lookahead = None

    @property
    def lookahead(self):
        if not self.__lookahead:
            try:
                self.__lookahead = next(self.tokens)
            except StopIteration:
                self.__lookahead = lex.Token(self.t.EOF, '', 0, -1)

        return self.__lookahead

    def next(self):
        if self.__lookahead and self.__lookahead.type == self.t.EOF:
            return self.__lookahead

        self.__lookahead = None
        return self.lookahead

    def match(self, token_type):
        if self.lookahead.type == token_type:
            return self.next()

        raise SyntaxError(f'Expected {token_type}, got {self.lookahead.type}', ('<string>', self.lookahead.lineno, self.lookahead.pos, self.string))

    # THE PARSING STARTS HERE
    def parse(self, string):
        # setup
        self.string = string
        self.tokens = self.lexer.tokenize(string)
        self.__lookahead = None
        self.next()

        # do parsing
        ret = [''] + self.parse_opt_list()

        return ' '.join(ret)

    def parse_opt_list(self) -> list:
        ret = self.parse_option(1)
        ret.extend(self.parse_opt_list_(1))

        return ret

    def parse_opt_list_(self, curr_opt_number) -> list:
        if self.lookahead.type in {self.t.EOF}:
            return []

        self.match(self.t.comma)

        ret = self.parse_option(curr_opt_number + 1)
        ret.extend(self.parse_opt_list_(curr_opt_number + 1))

        return ret

    def parse_option(self, opt_number) -> list:
        ret = [f'{opt_number}.']

        if self.lookahead.type == self.t.item:
            ret.extend(self.parse_choice())
        elif self.lookahead.type == self.t.num:
            ret.extend(self.parse_mult())
        else:
            raise SyntaxError(f'Expected {token_type}, got {self.lookahead.type}', ('<string>', self.lookahead.lineno, self.lookahead.pos, self.string))

        ret[-1] += '\n'

        return ret

    def parse_choice(self) -> list:
        c = self.parse_compound()
        m = self.parse_more_choices()
        e = self.parse_exchange()

        if not m:
            if not e:
                ret = f'You may take {" ".join(c)}'
            else:
                ret = f'for every {e} models you may take item {" ".join(c)}'
        elif m:
            c.extend(m)

            if not e:
                ret = f'each model may take one of: {", ".join(c)}'
            else:
                ret = f'for every {e} models you may exchange the following items with each other: {", ".join(c)}'
        else:
            ret = 'Semantic error!'

        return [ret]


    def parse_compound(self) -> list:
        ret = [self.lookahead.value]

        self.match(self.t.item)
        _ret = self.parse_add_item()

        return [' '.join(ret + _ret)]

    def parse_add_item(self) -> list:
        if self.lookahead.type in {self.t.comma, self.t.minus, self.t.slash, self.t.EOF}:
            return []

        ret = ['with']   
        self.match(self.t.plus)

        ret.append(self.lookahead.value)
        self.match(self.t.item)

        return ret + self.parse_add_item()


    def parse_more_choices(self) -> list:
        if self.lookahead.type in {self.t.comma, self.t.minus, self.t.EOF}:
            return []

        self.match(self.t.slash)
        ret = self.parse_compound()

        return ret + self.parse_more_choices()


    def parse_exchange(self) -> str:
        if self.lookahead.type in {self.t.comma, self.t.EOF}:
            return ''

        self.match(self.t.minus)

        ret = self.lookahead.value
        self.match(self.t.num)

        return ret

    def parse_mult(self) -> list:
        ret = [f'each model may take {self.lookahead.value} of:']

        self.match(self.t.num)
        self.match(self.t.star)

        return ret + self.parse_compound()

第三部分：用法

以下是所有代码的使用方法：

# FILE: evaluate.py

import enum

from lex import Lexer
from parse import Parser


# these are all the types of tokens present in our grammar
token_types = enum.Enum('Types', 'item num plus minus star slash comma space newline empty EOF')

t = token_types

# these are the regexes that the lexer uses to recognise the tokens
terminals_regexes = {
    t.item: r'[a-zA-Z_]\w*',
    t.num: '0|[1-9][0-9]*',
    t.plus: r'\+',
    t.minus: '-',
    t.star: r'\*',
    t.slash: '/',
    t.comma: ',',
    t.space: r'[ \t]',
    t.newline: r'\n'
}

lexer = Lexer(token_types, terminals_regexes)
parser = Parser(lexer)

string = 'itemA, itemB/itemC-3, 2*itemD, itemE/itemF/itemG, itemH/itemI+itemJ'
print(f'STRING FROM THE QUESTION: {string!r}\nRESULT:')
print(parser.parse(string), '\n\n')


string = input('Enter a command: ')

while string and string.lower() not in {'q', 'quit', 'e', 'exit'}:
    try:
        print(parser.parse(string))
    except SyntaxError as e:
        print(f'    Syntax error: {e}\n    {e.text}\n' + ' ' * (4 + e.offset - 1) + '^\n')

    string = input('Enter a command: ')

示例会话：

# python3 evaluate.py

STRING FROM THE QUESTION: 'itemA, itemB/itemC-3, 2*itemD, itemE/itemF/itemG, itemH/itemI+itemJ'
RESULT:
 1. You may take itemA
 2. for every 3 models you may exchange the following items with each other: itemB, itemC
 3. each model may take 2 of: itemD
 4. each model may take one of: itemE, itemF, itemG
 5. each model may take one of: itemH, itemI with itemJ



Enter a command: itemA/b/c/stuff
 1. each model may take one of: itemA, b, c, stuff

Enter a command: 4 * anything
 1. each model may take 4 of: anything

Enter a command: 5 * anything + more
 1. each model may take 5 of: anything with more

Enter a command: a + b + c+ d
 1. You may take a with b with c with d

Enter a command: a+b/c
 1. each model may take one of: a with b, c

Enter a command: itemA/itemB-2
 1. for every 2 models you may exchange the following items with each other: itemA, itemB

Enter a command: itemA+itemB/itemC - 5
 1. for every 5 models you may exchange the following items with each other: itemA with itemB, itemC

Enter a command: q

使用python

1 个答案:

第一部分：词法分析器

第二部分：解析器（具有语法驱动的评估）：

第三部分：用法

示例会话：