我想使用Antlr解析一串法语日期。
我有三种类型的日期:
date_complete : date_day date_hour
我要解析的文档只是date_day
,date_time
和date_complete
的链(没有分隔符)。
以下是我要解析的字符串的示例
3 Octobre 2005 12h 13h 5 Octobre 2004 3 Septembre 2005 11h
Expected : date_complete date_time date_day date_complete
12h
Expected : date_time
3 Octobre 2005 5 Octobre 2004 12h 13h 3 Septembre 2005 11h
Expected : date_day date_complete date_time date_complete
**// NEW REQUIREMENTS**
3 Octobre 2005
Expected : date_day
3 Octobre
Expected : date_day
3
Expected : date_day
我尝试了很多东西,Antlr v3总是说我的语法含糊不清:
warning(200): /meleo.dates/src/Grammar.g:25:48:
Decision can match input such as "{FRI, MON..TUE, WED} TWO_DIGITS DECEMBER FOUR_DIGITS {FRI..HOURG, MON..WED}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
|---> date_day (date_day | date_complete | date_hour)+
写这种语法的适当方法是什么?
这是语法:
grammar MeleoDates;
options {
language = Java;
}
@header {
package meleo.data.dates ;
import rainstudios.meleo.crawler.data.Dates ;
import rainstudios.meleo.crawler.data.EventDate ;
}
@lexer::header {
package meleo.data.dates ;
import rainstudios.meleo.crawler.data.EventDate ;
}
input returns [Dates dates]
@init {Dates r = new Dates() ; } :
( date
{r.addDay($date.date);}
DATE_SEP?)+
EOF
{$dates = r ;}
;
date returns [EventDate date] :
(date_complete)=> date_complete
{$date = $date_complete.date;}
| date_day
{$date = $date_day.date;}
| date_time
{$date = $date_time.date;}
;
date_complete returns [EventDate date]
@init {EventDateBuilder builder = new EventDateBuilder() ; } :
day=date_day
{builder.addDay($day.date);}
HOUR_SEP?
time=date_time
{builder.addTime($time.date);}
{$date = builder.toDate();}
;
date_day returns [EventDate date]
@init {EventDateBuilder builder = new EventDateBuilder() ; } :
(
dayOfWeek=(
MON
| TUE
| WED
| THU
| FRI
| SAT
| SUN
)?
(day=INT)=> INT
{builder.addDay($day.text);}
( m=ID
{builder.addMonth($m.text);}
year=INT ?
{builder.addMonth($year.text);}
)?
)
{$date = builder.toDate();}
;
date_time returns [EventDate date]
@init {EventDateBuilder builder = new EventDateBuilder() ; } :
TIME
{builder.addTime($TIME.text);}
{$date = builder.toDate();}
;
month : DECEMBER | JANUARY ;
MON
: 'lundi'
| 'lun'
;
TUE
: 'mardi'
| 'mar'
;
WED
: 'mercredi'
| 'mer'
;
THU
: 'jeudi'
| 'jeu'
;
FRI
: 'venredi'
| 'ven'
;
SAT
: 'samedi'
| 'sam'
;
SUN
: 'dimanche'
| 'dim'
;
DECEMBER : 'dec' | 'decembre' ;
JANUARY : 'jan' | 'janvier' ;
DATE_SEP : 'et'| ',' | '-';
HOUR_SEP : 'à' | 'a' ;
INT : ('0'..'9')+;
TIME_SEP : ':' | 'h' ;
TIME : INT TIME_SEP INT?;
ID : ('a'..'z'|'A'..'Z')+;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
**编辑:添加了新要求(date_day的可选月份和年份)**
答案 0 :(得分:2)
考虑使用句法谓词:
input : date+;
date : (date_complete) => date_complete
| date_day
| date_time
;
这实际上告诉ANTLR尝试匹配date_complete
之前尝试匹配它常见的任何内容(这可能不是技术上准确的描述,但你明白了)。如果没有这个,date
规则可以匹配具有相同输入的多个选项,而ANTLR(无论如何,v3)无法解决该问题。
这是一个完整的测试语法:
grammar AmbiguousDates;
input : date+ EOF;
date : (date_complete)=> date_complete
{System.out.println("date_complete: " + $date_complete.str);}
| date_day
{System.out.println("date_day: " + $date_day.str);}
| date_time
{System.out.println("date_time: " + $date_time.str);}
;
date_complete returns [String str]
: date_day date_time
{$str = String.format("\%s \%s", $date_day.str, $date_time.str);}
;
date_day returns [String str]
: day=INT ID year=INT
{$str = String.format("\%s \%s \%s", $day.text, $ID.text, $year.text);}
;
date_time returns [String str]
: TIME
{$str = $TIME.text;}
;
INT : ('0'..'9')+;
TIME : INT 'h';
ID : ('a'..'z'|'A'..'Z')+;
WS : (' '|'\t'|'\f'|'\r'|'\n')+ {skip();};
3 Octobre 2005 12h 13h 5 Octobre 2004 3 Septembre 2005 11h
date_complete: 3 Octobre 2005 12h
date_time: 13h
date_day: 5 Octobre 2004
date_complete: 3 Septembre 2005 11h
答案 1 :(得分:0)
我不认为你通过使用ANTLR获得任何东西。您可以使用SimpleDateFormat#parse以及一些额外的工作来检查尾随时间(即“h”)标记,以实现您的目标,如下所示:
包裹问题;
import java.text.ParseException;
import java.text.ParsePosition;
import java.text.SimpleDateFormat;
import java.util.Locale;
public class FrenchDateParser {
private static SimpleDateFormat date_complete = new SimpleDateFormat("d MMMM yyyy h", Locale.FRENCH);
private static SimpleDateFormat date_day = new SimpleDateFormat("d MMMM yyyy", Locale.FRENCH);
private static SimpleDateFormat date_time = new SimpleDateFormat("h", Locale.FRENCH);
private static String parse(String input) {
ParsePosition parsePosition = new ParsePosition(0);
StringBuilder stringBuilder = new StringBuilder();
int inputSize = input.length();
while (parsePosition.getIndex() < inputSize) {
int startingParsePositionIndex = parsePosition.getIndex();
if (date_complete.parse(input, parsePosition) != null) {
if (input.charAt(parsePosition.getIndex()) == 'h') {
stringBuilder.append("date_complete ");
parsePosition.setIndex(parsePosition.getIndex() + 1);
continue;
}
parsePosition.setIndex(startingParsePositionIndex);
}
if (date_day.parse(input, parsePosition) != null) {
stringBuilder.append("date_day ");
continue;
}
if (date_time.parse(input, parsePosition) != null) {
if (input.charAt(parsePosition.getIndex()) == 'h') {
stringBuilder.append("date_time ");
parsePosition.setIndex(parsePosition.getIndex() + 1);
continue;
}
parsePosition.setIndex(startingParsePositionIndex);
}
throw new IllegalArgumentException("Unable to parse input [" + input + "]");
}
return stringBuilder.toString().trim();
}
public static void main(String... args) throws ParseException {
String[] inputs = {"3 Octobre 2005 12h 13h 5 Octobre 2004 3 Septembre 2005 11h", "12h",
"3 Octobre 2005 5 Octobre 2004 12h 13h 3 Septembre 2005 11h"};
String[] expecteds = {"date_complete date_time date_day date_complete", "date_time",
"date_day date_complete date_time date_complete"};
for (int i = 0; i < inputs.length; i++) {
String actual = parse(inputs[i]);
System.out.println(expecteds[i].equals(actual));
}
}
}