我有一个Java字符串,其中包含时间(am | pm格式)以及诸如SET或UNSET的特定关键字以及一些其他无关的单词。 例如将时间设置为上午10:30 或取消设置时间为晚上10:30。
我已经有正则表达式作为
regex_am_pm = "(?:\\s{1,2}[1-9]|\\s{1,2}0[1-9]|\\s{1,2}1[0-2]):[0-5][0-9]\\s{0,2}(?:am|pm|AM|PM)";.
如何包含正则表达式以在时间REGEX之前检查缩写SET或UNSET。
请帮助。
regex_am_pm = "(SET|UNSET)(?:\\s{1,2}[1-9]|\\s{1,2}0[1-9]|\\s{1,2}1[0-2]):[0-5][0-9]\\s{0,2}(?:am|pm|AM|PM)";
我要搜索设置或未设置的时间正则表达式。两者之间的任何允许内容
预期输出
String passed = "hey Set clock to 10:30 PM"
if SET found before time regex
outPut if(SetMethod(String time))
else(UNSetMethod(String time))
答案 0 :(得分:0)
您使用任何NLP解析器吗? 几点建议:
我不需要为我的女儿买一套三个耳环。 更多。我要睡到晚上10:30。
您在这里也有歧义(此处设置的含义与您的意思不同)。而且您需要的含义仅用一个句子/短语。
String regex_am_pm = "(\\s((UN)?SET)\\s(.*?)[1-2]\\d:[0-5]\\d)\\s(am|pm|AM|PM)";
顺便说一句,如果您看到以下文字,则会出现问题:
我需要取消设置我的应用程序并将时钟设置为10:30 AM。
如果要在Java以外的其他环境中测试正则表达式,请在\位置使用\:
(\s((UN)?SET)\s(.*?)[1-2]\d:[0-5]\d)\s(am|pm|AM|PM)
答案 1 :(得分:0)
我认为您应该验证传入的字符串,以确保它实际上包含SET或UNSET字样(无论大小写如何),单词 time 在字符串内,以便确认SET的用途,以及字符串中还包含一个时间的事实。
字符串规则:
您可以尝试一些代码:
// The Regular Expression (RegEx) we are going to use...
String regEx = "(?i)(\\bSET\\b|\\bUNSET\\b)(.*?\\btime\\b.*?)?(\\d{2}\\:\\d{2}(\\s+)?(am|pm)?)";
String incomingString = "set time as 10:30 am";
String setType = "NONE AVAILABLE!"; // Default
String setTime = "NONE AVAILABLE!"; // Default
String timeFormat = ""; // Default
// Does the incoming String meet our requirements?
if (incomingString.trim().matches(regEx)) {
// Yes it does...
System.out.println("String contains valid content.");
// Get the required information from the input String...
Pattern r = Pattern.compile(regEx);
Matcher m = r.matcher(incomingString);
if (m.find()) {
setType = m.group(1).toUpperCase();
setTime = m.group(3).toUpperCase();
}
// Is the time Valid
timeFormat = validateTime(setTime); // see validateTime() method
if (timeFormat.equals("NONE")) {
// Reset to defaults
setType = "NONE AVAILABLE!";
setTime = "NONE AVAILABLE!";
}
}
// Display the findings...
System.out.println("Set Type: " + setType);
System.out.println("Time: " + setTime + " (in " + timeFormat + ")");
validateTime()方法:
/**
* If valid this method will return a string indicating the Time Format
* otherwise it will return the uppercase word string: "NONE".<br>
*
* @param time (String) The time to validate in HH:mm or hh:mm (am/pm).<br>
*
* @return (String) Either "24 Hour Format", "12 Hour Format", or "NONE" if
* validation fails.
*/
public static String validateTime(String time) {
String fmt = "NONE";
// 12 Hour Time...
if (time.matches("(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)")) {
fmt = "12 Hour Format";
}
// 24 Hour Time...
else if (time.matches("([01]?[0-9]|2[0-3]):[0-5][0-9]")) {
fmt = "24 Hour Format";
}
return fmt;
}
正则表达式说明:
(?i)(\\bSET\\b|\\bUNSET\\b)(.*?\\btime\\b.*?)?(\d{2}\\:\\d{2}(\\s+)?(am|pm)?)
(?i)
match the remainder of the pattern with the following effective flags: gmi
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
1st Capturing Group: (\\bSET\\b|\\bUNSET\\b)
1st Alternative: \\bSET\\b
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
SET matches the characters SET literally (case insensitive)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
2nd Alternative: \\bUNSET\\b
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
UNSET matches the characters UNSET literally (case insensitive)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
2nd Capturing Group: (.*?\\btime\\b.*?)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
.*? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
time matches the characters time literally (case insensitive)
\\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
.*? matches any character (except for line terminators)
3rd Capturing Group: (\\d{2}\\:\\d{2}(\\s+)?(am|pm)?)
\\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
\\: matches the character : literally (case insensitive)
\\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
4th Capturing Group: (\\s+)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
5th Capturing Group: (am|pm)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative: am
am matches the characters am literally (case insensitive)
2nd Alternative: pm
pm matches the characters pm literally (case insensitive)