Question

我有一个Java字符串，其中包含时间（am | pm格式）以及诸如SET或UNSET的特定关键字以及一些其他无关的单词。例如将时间设置为上午10:30 或取消设置时间为晚上10:30。

我已经有正则表达式作为

regex_am_pm = "(?:\\s{1,2}[1-9]|\\s{1,2}0[1-9]|\\s{1,2}1[0-2]):[0-5][0-9]\\s{0,2}(?:am|pm|AM|PM)";.

如何包含正则表达式以在时间REGEX之前检查缩写SET或UNSET。

请帮助。

regex_am_pm = "(SET|UNSET)(?:\\s{1,2}[1-9]|\\s{1,2}0[1-9]|\\s{1,2}1[0-2]):[0-5][0-9]\\s{0,2}(?:am|pm|AM|PM)";

我要搜索设置或未设置的时间正则表达式。两者之间的任何允许内容

预期输出

String passed = "hey Set clock to 10:30 PM"
if SET found before time regex
outPut if(SetMethod(String time))
       else(UNSetMethod(String time))

Answer 1

您使用任何NLP解析器吗？几点建议：

您应该在句子前剪切文本。我想您不希望您的程序遇到这种情况：

我不需要为我的女儿买一套三个耳环。更多。我要睡到晚上10:30。

您在这里也有歧义（此处设置的含义与您的意思不同）。而且您需要的含义仅用一个句子/短语。

在短语/句子中，您可以这样做：

String regex_am_pm = "(\\s((UN)?SET)\\s(.*?)[1-2]\\d:[0-5]\\d)\\s(am|pm|AM|PM)";

顺便说一句，如果您看到以下文字，则会出现问题：

我需要取消设置我的应用程序并将时钟设置为10:30 AM。

如果要在Java以外的其他环境中测试正则表达式，请在\位置使用\：

(\s((UN)?SET)\s(.*?)[1-2]\d:[0-5]\d)\s(am|pm|AM|PM)

Answer 2

我认为您应该验证传入的字符串，以确保它实际上包含SET或UNSET字样（无论大小写如何），单词 time 在字符串内，以便确认SET的用途，以及字符串中还包含一个时间的事实。

字符串规则：

必须包含单词 SET 或 UNSET （无论字母如何）案件）。要么是单词本身，要么是另一个单词的一部分；
必须包含单词时间（无论大小写如何）某个地方在 SET 或 UNSET 之后以消除一些模棱两可；
必须包含格式为 hh：mm 的时间（12小时制）或 HH：mm （24小时）。 AM 或 PM 是可选的，并且字母大小写无关。实际时间必须放置在之后的某个地方 再次使用时间一词，以消除一些歧义。

您可以尝试一些代码：

// The Regular Expression (RegEx) we are going to use...
String regEx = "(?i)(\\bSET\\b|\\bUNSET\\b)(.*?\\btime\\b.*?)?(\\d{2}\\:\\d{2}(\\s+)?(am|pm)?)";

String incomingString = "set time as 10:30 am";

String setType = "NONE AVAILABLE!"; // Default
String setTime = "NONE AVAILABLE!"; // Default
String timeFormat = "";             // Default

// Does the incoming String meet our requirements?
if (incomingString.trim().matches(regEx)) {
    // Yes it does...
    System.out.println("String contains valid content.");

    // Get the required information from the input String...
    Pattern r = Pattern.compile(regEx);
    Matcher m = r.matcher(incomingString);
    if (m.find()) {
        setType = m.group(1).toUpperCase();
        setTime = m.group(3).toUpperCase();
    }

    // Is the time Valid
    timeFormat = validateTime(setTime); // see validateTime() method
    if (timeFormat.equals("NONE")) {
        // Reset to defaults
        setType = "NONE AVAILABLE!";
        setTime = "NONE AVAILABLE!";
    }
}
// Display the findings...
System.out.println("Set Type: " + setType);
System.out.println("Time:     " + setTime + " (in " + timeFormat + ")");

validateTime（）方法：

/**
 * If valid this method will return a string indicating the Time Format 
 * otherwise it will return the uppercase word string: "NONE".<br>
 * 
 * @param time (String) The time to validate in HH:mm or hh:mm (am/pm).<br>
 * 
 * @return (String) Either "24 Hour Format", "12 Hour Format", or "NONE" if 
 * validation fails.
 */
public static String validateTime(String time) {
    String fmt = "NONE";
    // 12 Hour Time...
    if (time.matches("(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)")) {
        fmt = "12 Hour Format";
    }        
    // 24 Hour Time...
    else if (time.matches("([01]?[0-9]|2[0-3]):[0-5][0-9]")) {
        fmt = "24 Hour Format";
    }
    return fmt;
}

正则表达式说明：

(?i)(\\bSET\\b|\\bUNSET\\b)(.*?\\btime\\b.*?)?(\d{2}\\:\\d{2}(\\s+)?(am|pm)?)

(?i)
match the remainder of the pattern with the following effective flags: gmi
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

1st Capturing Group:   (\\bSET\\b|\\bUNSET\\b)
1st Alternative:  \\bSET\\b
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
SET matches the characters SET literally (case insensitive)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)

2nd Alternative:  \\bUNSET\\b
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
UNSET matches the characters UNSET literally (case insensitive)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)

2nd Capturing Group:   (.*?\\btime\\b.*?)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
.*?  matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
time matches the characters time literally (case insensitive)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
.*?  matches any character (except for line terminators)

3rd Capturing Group:  (\\d{2}\\:\\d{2}(\\s+)?(am|pm)?)
\\d{2}  matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times 
\\: matches the character : literally (case insensitive)
\\d{2}  matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times 

4th Capturing Group:   (\\s+)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
\s+  matches any whitespace character (equal to [\r\n\t\f\v ])

5th Capturing Group:   (am|pm)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative:   am
am matches the characters am literally (case insensitive)
2nd Alternative:  pm
pm matches the characters pm literally (case insensitive)

REGEX用于搜索特定的关键字以及regex用于时间搜索

2 个答案: