将字符串转换为单词列表

时间:2018-03-15 16:41:01

标签: sml smlnj

如何传递字符串并将其转换为sml中的单词列表?

例如:"one two three"["one", "two", "three"]

1 个答案:

答案 0 :(得分:1)

您可以(也可能应该)使用String.tokens

- String.tokens Char.isSpace "one two three";
> val it = ["one", "two", "three"] : string list

还有String.fields。它们在处理连续/多余分离器方面有所不同:

- String.tokens Char.isSpace "  one  two  three  ";
> val it = ["one", "two", "three"] : string list
- String.fields Char.isSpace "  one  two  three  ";
> val it = ["", "", "one", "", "two", "", "three", "", ""] : string list

如果您的字符串有多个潜在的分隔符,而您只对以下字词感兴趣:

fun isWordSep c = Char.isSpace c orelse
                ( Char.isPunct c andalso c <> #"-" andalso c <> #"'" )
val words = String.tokens isWordSep

这适用于单词的一个定义:

- words "I'm jolly-good.  Are you?";
> val it = ["I'm", "jolly-good", "Are", "you"] : string list

并非所有自然语言都遵守此定义,例如: 例如是首字母缩略词而不是两个词, e g 。如果您准确无误地进入自然语言处理领域。