Question

给定格式化字符串：

x = "hello %(foo)s  there %(bar)s"

有没有办法获取格式变量的名称？（不自己直接解析它们）。

使用正则表达式并不会太难，但我想知道是否有更直接的方法来获取这些。

Answer 1

使用带有重写dict方法的__missing__子类，然后您可以从中收集所有丢失的格式变量：

class StringFormatVarsCollector(dict):
    def __init__(self, *args, **kwargs):
        self.format_vars = []

    def __missing__(self, k):
        self.format_vars.append(k)
...         
def get_format_vars(s):
    d = StringFormatVarsCollector()     
    s % d                    
    return d.format_vars
... 
>>> get_format_vars("hello %(foo)s  there %(bar)s")
['foo', 'bar']

Answer 2

如果您不想解析字符串，可以使用这个小函数：

def find_format_vars(string):
    vars= {}
    while True:
        try:
            string%vars
            break
        except KeyError as e:
            vars[e.message]= ''
    return vars.keys()

>>> print find_format_vars("hello %(foo)s there %(bar)s") ['foo', 'bar']

Answer 3

格式字段仅对%运算符有效，而不是字符串本身。因此，没有像str.__format_fields__这样的属性，您可以访问这些属性以获取字段名称。

我说在这种情况下使用正则表达式实际上是正确的方法。您可以轻松使用re.findall提取名称：

>>> import re
>>> x = "hello %(foo)s  there %(bar)s"
>>> re.findall('(?<!%)%\(([^)]+)\)[diouxXeEfFgGcrs]', x)
['foo', 'bar']
>>>

以下是对模式的解释：

(?<!%)             # Negated look-behind to make sure that we do not match %% 
%                  # Matches %
\(                 # Matches (
(                  # Starts a capture group
[^)]+              # Matches one or more characters that are not )
)                  # Closes the capture group
\)                 # Matches )
[diouxXeEfFgGcrs]  # Matches one of the characters in the square brackets

Answer 4

新样式字符串格式化具有此功能。

from string import Formatter

f = Formatter()
x = "hello {foo}s  there {bar}s"
parsed = f.parse(x)

解析的结果将是具有以下格式的元组的可迭代：
（literal_text，field_name，format_spec，conversion）

因此，很容易拉出元组的field_name部分：

field_names = [tup[1] for tup in parsed]

如果您想了解更多深入信息，请参阅以下文档 https://docs.python.org/2/library/string.html#string.Formatter

单一列表理解版本：

[tup[1] for tup in "hello {foo}s  there {bar}s"._formatter_parser()]

如何获取格式化字符串中使用的名称列表？

4 个答案: