Prolog:将自然语言文件读入单词列表?

时间:2016-12-05 19:34:28

标签: io prolog

我想使用prolog将自然语言文件读入一个单词列表。以下是该文件的示例:

The Architecture major occupies the office with the Ctrl+Alt+Del comic poster.
The CSE major belongs to the RPI Flying Club.

我已经有一些代码可以处理空格,标点符号和大小写,我只是不确定如何打开文件并将这些数据提供给该代码。

%%%  Examples:
%%%           % read_line(L).
%%%           The sky was blue, after the rain.
%%%           L = [the,sky,was,blue,',',after,the,rain,'.']
%%%           % read_line(L).
%%%           Which way to the beach?
%%%           L = [which,way,to,the, beach,'?']
%%%

read_line(Words) :- get0(C),
                    read_rest(C,Words).

/* A period or question mark ends the input. */
read_rest(46,['.']) :- !.
read_rest(63,['?']) :- !.

/* Spaces and newlines between words are ignored. */
read_rest(C,Words) :- ( C=32 ; C=10 ) , !,
                     get0(C1),
                     read_rest(C1,Words).

/* Commas between words are absorbed. */
read_rest(44,[','|Words]) :- !,
                             get0(C1),
                             read_rest(C1,Words).

/* Otherwise get all of the next word. */
read_rest(C,[Word|Words]) :- lower_case(C,LC),
                             read_word(LC,Chars,Next),
                             name(Word,Chars),
                             read_rest(Next,Words).

/* Space, comma, newline, period or question mark separate words. */
read_word(C,[],C) :- ( C=32 ; C=44 ; C=10 ;
                         C=46 ; C=63 ) , !.

/* Otherwise, get characters, convert alpha to lower case. */
read_word(C,[LC|Chars],Last) :- lower_case(C,LC),
                                get0(Next),
                                read_word(Next,Chars,Last).

/* Convert to lower case if necessary. */
lower_case(C,C) :- ( C <  65 ; C > 90 ) , !.
lower_case(C,LC) :- LC is C + 32.


/* for reference ...
newline(10).
comma(44).
space(32).
period(46).
question_mark(63).
*/

1 个答案:

答案 0 :(得分:0)

这是我提出的解决方案,但它有一个奇怪的错误。如果从read_file中删除maplist语句,整个程序将挂起。有人知道修复吗?

/* Opens file and sends to recursive read_line*/
read_file(Hints, File) :-
                    open(File, read, Stream, [type(binary)]),
                    read_line(Hints, Stream),
                    close(Stream),
                    writeln("File read complete"), nl,
                    maplist(writeln, Hints), nl.

/* Reads in a single line, places in master list, continues in file*/
read_line([H|T], Stream) :- 
                    get_byte(Stream, C),
                    read_rest(C, H, Stream),

                    %Breaks on EOF, otherwise continues
                    ( at_end_of_stream(Stream)
                    -> !
                    ; read_line(T, Stream)
                    ).