自动生成正则表达式

时间:2013-05-16 13:33:37

标签: python

我想要以下功能:

def get_pattern_and_replacement(the_input, output):
    """
    Given the_input and output returns the pattern for matching more general case of the_input and a template string for generating the desired output.

    >>> get_pattern_and_replacement("You're not being nice to me.", "I want to be treated nicely.")
    ("You're not being (?P<word>\w+) to me.", "I want to be treated {{ word }}ly.")
    >>> get_pattern_and_replacement("You're not meeting my needs.", "I want my needs met.")
    ("You're not meeting my (?P<word>\w+).", "I want my {{ word }} met.")
    """

这是一个程序将不需要的文本转换为所需的文本。

在Stackoverflow用户的帮助下,我的功能现在是:

def flatten(nested_list):
    return [item for sublist in nested_list for item in sublist]

def get_pattern_and_replacement(the_input, output):
    """
    Given the_input and output returns the pattern for matching more general case of the_input and a template string for generating the desired output.

    >>> get_pattern_and_replacement("You're not being nice to me.", "I want to be treated nicely.")
    ("You're not being (?P<word>\w+) to me.", "I want to be treated {{ word }}ly.")
    >>> get_pattern_and_replacement("You're not meeting my needs.", "I want my needs met.")
    ("You're not meeting my (?P<word>\w+).", "I want my {{ word }} met.")
    """
    input_set = set(flatten([[the_input[i: i + j] for i in range(len(the_input) - j) if not ' ' in the_input[i: i + j]] for j in range(3, 12)]))
    output_set = set(flatten([[output[i: i + j] for i in range(len(the_input) - j) if not ' ' in output[i: i + j]] for j in range(3, 12)]))

    intersection = input_set & output_set
    intersection = list(intersection)
    intersection = sorted(intersection, key=lambda x: len(x))[::-1]
    print intersection
    pattern = the_input.replace(intersection[0], '(?P<word>\w+)')
    replacement = output.replace(intersection[0], '{{ word }}')
    return (pattern, replacement)

1 个答案:

答案 0 :(得分:2)

如果你想要这种模板转换,你必须自己编写。认识到共同的部分是常识,实践和创造力的问题;没有一般规则可以为你做。但是你必须阅读关于正则表达式的教程,它可能会帮助你思考这个问题。

您应该查看 Eliza的源代码,启动它的着名聊天机器人。这是the source to a python version.正如您将看到的,会话规则是手写的。

如果您希望使用生成模板的算法,就像您所包含的示例一样:这是一个非常非常困难的问题,没有一个合理的解决方案。算了吧。请改为阅读正则表达式教程。