如何传递字符串并将其转换为sml中的单词列表?
例如:"one two three"
到["one", "two", "three"]
答案 0 :(得分:1)
您可以(也可能应该)使用String.tokens
:
- String.tokens Char.isSpace "one two three";
> val it = ["one", "two", "three"] : string list
还有String.fields
。它们在处理连续/多余分离器方面有所不同:
- String.tokens Char.isSpace " one two three ";
> val it = ["one", "two", "three"] : string list
- String.fields Char.isSpace " one two three ";
> val it = ["", "", "one", "", "two", "", "three", "", ""] : string list
如果您的字符串有多个潜在的分隔符,而您只对以下字词感兴趣:
fun isWordSep c = Char.isSpace c orelse
( Char.isPunct c andalso c <> #"-" andalso c <> #"'" )
val words = String.tokens isWordSep
这适用于单词的一个定义:
- words "I'm jolly-good. Are you?";
> val it = ["I'm", "jolly-good", "Are", "you"] : string list
并非所有自然语言都遵守此定义,例如: 例如是首字母缩略词而不是两个词, e 和 g 。如果您准确无误地进入自然语言处理领域。