Prolog:简单的lexer / 2

时间:2016-03-16 14:23:34

标签: prolog dcg

我需要一个小lexer/2的prolog,目前我有

tokens(Z) --> "while", tokens(Y), {Z = [ttwhile | Y]}.
tokens(Z) --> "do", tokens(Y), {Z = [ttdo | Y]}.
tokens(Z) --> "endwhile", tokens(Y), {Z = [ttendwhile | Y]}.
tokens(Z) --> "repeat", tokens(Y), {Z = [ttrepeat | Y]}.
tokens(Z) --> "until", tokens(Y), {Z = [ttuntil | Y]}.
tokens(Z) --> "endrepeat", tokens(Y), {Z = [ttendrepeat | Y]}.
tokens(Z) --> "if", tokens(Y), {Z = [ttif | Y]}.
tokens(Z) --> "then", tokens(Y), {Z = [ttthen | Y]}.
tokens(Z) --> "else", tokens(Y), {Z = [ttelse | Y]}.
tokens(Z) --> "endif", tokens(Y), {Z = [ttendif | Y]}.
tokens(Z) --> "exit", tokens(Y), {Z = [ttexit | Y]}.
tokens(Z) --> "other", tokens(Y), {Z = [ttother | Y]}.

% Comparison operators.
tokens(Z) --> "==", tokens(Y), {Z = [equal | Y]}.
tokens(Z) --> "<>", tokens(Y), {Z = [notequal | Y]}.

% Assignment operator.
tokens(Z) --> ":=", tokens(Y), {Z = [:= | Y]}.  

% Boolean constants and operators.
tokens(Z) --> "true", tokens(Y), {Z = [true | Y]}.  
tokens(Z) --> "false", tokens(Y), {Z = [false | Y]}.  
tokens(Z) --> "and", tokens(Y), {Z = [and | Y]}.  
tokens(Z) --> "or", tokens(Y), {Z = [or | Y]}.  

tokens(Z) --> " ", tokens(Y), {Z = Y}.
tokens(Z) --> " ", tokens(Y), {Z = Y}.

tokens(Z) --> [C], tokens(Y), {name(X, [C]), Z = [X | Y]}.
tokens(Z) --> [], {Z = []}.

任何人都可以帮助我完成lexer/2的下一步,以便在我打电话时 lexer([while,a,==,b,do,abc,endwhile], R),我可以R = [ttwhile, a, equal, b, ttdo, abc, ttendwhile]

非常感谢。

2 个答案:

答案 0 :(得分:1)

好吧,这种“粘合剂” - 或多或少 - 解决了您的要求:

lexer(L, Tokens) :-
    atomic_list_concat(L, ' ', A),
    atom_codes(A, Cs),
    phrase(tokens(Tokens), Cs).

?- lexer([while,a,==,b,do,abc,endwhile], R).
R = [ttwhile, a, equal, b, ttdo, a, b, c, ttendwhile] ;
R = [ttwhile, a, equal, b, ttdo, a, b, c, e|...] ;

但你应该用声明式样式重写:

token(ttwhile) --> "while".
token(ttendwhile) --> "endwhile".
token(ttdo) --> "do".
%...
token(equal) --> "==".
token(notequal) --> "<>".
token(assign) --> ":=". 

% this is wrong: symbols overlap with alphabetic tokens
token(N) --> [C], {atom_codes(N,[C])}.

tokens([]) --> [].
tokens(Ts) --> " ", tokens(Ts).
tokens([T|Ts]) --> token(T), tokens(Ts).

lexer(Cs, Tokens) :-
    phrase(tokens(Tokens), Cs).

并调用传递代码列表,双引号(或反引号,如果您使用的是SWI)字符串

?- lexer(`while abc endwhile`, R).
R = [ttwhile, a, b, c, ttendwhile] ;
R = [ttwhile, a, b, c, e, n, d, ttwhile] ;
...

修改

标记名称(为简单起见,只有小写),将上面的token(N) --> [C], {atom_codes(N,[C])}.替换为

token(N) --> lower_case_chars(Cs), {Cs \= [], atom_codes(N,Cs)}.

lower_case_chars([C|Cs]) --> lower_case_char(C), lower_case_chars(Cs).
lower_case_chars([]) --> [].

lower_case_char(C) --> [C], {C>=0'a, C=<0'z}.

但是当你添加upper_case_chars,数字等时,它变得有点冗长...值得概括,传递字符范围边界,或使用code_type / 2:

token(N) --> csymf(C), csyms(Cs), {atom_codes(N,[C|Cs])}.

csymf(C) --> [C], {code_type(C,csymf)}.

csyms([C|Cs]) --> [C], {code_type(C,csym)}, csyms(Cs).
csyms([]) --> [].

答案 1 :(得分:0)

以下解决方案如何?

SELECT * FROM events_by_type WHERE type IN ('T1', 'T2);

但是以这种方式调用lexer(I, O) :- tokens(O, I, []).

lexer()

我添加了一个建议:以这种方式重写lexer("while a == b do abc endwhile", R)

tokens()

P.s:抱歉我的英语不好。