根据数据制作regExp

时间:2014-03-17 12:04:34

标签: javascript regex dictionary

我正在制作能够识别人类发言计划的应用,例如"Each 2 weeks check car before 6p.m""run for 2 hours"

它可以用任何正确的形式书写,无论是数字还是单词(6可以是6或“6”)。

我已经制作了一些字典和一些规则

字典和规则的一部分:

plan.rules = {
    language : "EN", 
    dictionary : {
        numbers : {
            ones : [
                ["zero"], 
                ["one", "first", "once"],
                ["two", "second", "twice"],
                ["three", "third", "thrice"],
                ["four", "fourth"],
                ["five", "fifth"],
                ["six", "sixth"],
                ["seven", "seventh"],
                ["nine", "nineth"]
            ],
            teens : [
                [],
                ["ten", "tenth"],
                ["eleven", "eleventh"],
                ["twelwe", "twelweth"],
                ["fourteen", "fourteenth"],
                ["fiveteen", "fiveteenth"],
                ["sixteen", "sixteenth"],
                ["seventeen", "seventeenth"],
                ["eightteen", "eightteenth"],
                ["nineteen", "nineteenth"],
            ],
            tens : [
                [],
                ["ten"],
                ["twenty"],
                ["thirty"],
                ["fourtu"],
                ["fifty"],
                ["sixty"],
                ["seventy"],
                ["eighty"],
                ["ninety"],
            ]
        },
        peroids : {
            minute : ["min", "minute", "minutes"],
            hour : ["hour", "hours"],
            day : ["day", "days"],
            week : ["week", "weeks"],
            month : ["month", "months"],
            year : ["year", "years"]
        }
    },
    rules : {
        each : [
            "each {peroid}",
            "each {number} {peroid}",
            "every {peroid}",
            "every {number} {peroid}",

        ],
        for : [
            "for {peroid}",
            "for {number} {peroid}"
        ]
    }
}

基于以上数据,例如"Each two weeks check something"

  • "two"匹配数字 2

  • "weeks"匹配 peroid “周”

所以句子匹配模式"each {number} {peroid}"

我正在努力制作一些算法来分析输入,我正在考虑运行字典和规则的巨大循环,但也许可以根据这么多情况构建一些regExp?

如果我做错了,怎么可能呢?

1 个答案:

答案 0 :(得分:1)

你可以用正则表达式来做,但我认为你会得到一些非常不守规矩的正则表达式。

仅作为示例:如果您的文字总是包含单词each,后跟一些文字和number以及一些文字和period,您可以尝试执行此类操作(如果你决定扩展它,你需要更多的数字组合):

[Ee]ach.*(one|first|1|two|second|2).*(minute?|hour?|day?|week?|month?|year?)

Each two weeks check something匹配twoweek

Each first day check something else匹配firstday

See it in action

Each first day of the week do somethingEach 3rd week of the month do something无法使用。

使用自然语言有很多可能的方式来说each {number} {period}如果你想捕捉所有内容,使用正则表达式会非常难以使用。