Question

这需要比我更好的主意...如何提取文字正则表达式正在寻找的模式。

我的应用程序与一个API交互，该API为有关特定字段中用户输入的特定字符串提供Validation Regex。例如国家邮递区号：

Canada, {"ValidationRegex" : "^[ABCEGHJKLMNPRSTVXY]\\d[ABCEGHJ-NPRSTV-Z][\\s\\-]?\\d[ABCEGHJ-NPRSTV-Z]\\d$" }
United Kingdom, {"ValidationRegex": "((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})|GIR0AA" }

我正在尝试分析正则表达式模式，以查看它是仅字母，仅数字还是混合的。然后，基于针对各种字符串测试的验证格式，我想强制显示针对该特定输入字段显示给用户的键盘类型。键盘类型基于TEXT, TEL, NUMBER，并根据验证要求设置<input>的长度或范围限制。不幸的是，似乎没有任何其他信息可以让我准确地确定键盘类型或字符串长度，或者试图破坏Validation Regex。

这可能用于数百种不同类型的验证正则表达式，因此我希望找到一种通用的方法来从每个验证正则表达式中提取必要的信息，以确定字符串代码是仅数字还是需要字符...和长度范围。

Answer 1

While you can sort of automate this by making your own regex rules, there is too much of a margin for error for it to be something worth relying on. An approach to consider would be to dump all of their validation strings into a SQL table. Then you can sort them all in a way that each unique regex pattern is assigned an ID. Then in a second table you can match each of those IDs to a class TEXT, TEL, NUMBER, etc.

So You'd have tables that look like this:

tblRegexStrings
id | uniqueID | location         | regex
1    1          'Canada'           "^[ABCEGHJKLMNPRSTVXY]\\d[ABCEGHJ-NPRSTV-Z][\\s\\-]?\\d[ABCEGHJ-NPRSTV-Z]\\d$"
2    2          'United Kingdom'   "((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})|GIR0AA"
3    2          'France'           "((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})|GIR0AA"

tblRegexClasses
uniqueID | classType
1          'TEXT'
2          'TEL'

If there is a significant amount of reuse in regex patterns (which I suspect there will be), then this will help reduce it to a reasonable number of classes that you can assign by hand. So, if 50 locations use the same regex pattern as the United Kingdom, you can set all of thier classes at once by changing the classType of line 2 in tblRegexClasses.

If it is still so much data that you need to regex it, but there is also a lot of reuse, then I'd still recommend using this since you will still need to traige your data. Since your regex functions may make bad assumptions, or so you don't need to make tons of exceptions for every fringe case.

php正则表达式剖析文字正则表达式以确定要查找的内容

1 个答案: