如何从F#中的文本文件中仅提取单词?

时间:2017-12-04 12:30:03

标签: string text f#

我试图只从文本文件中提取非常简单的单词:

Please note that you still have an unclaimed iPhone 7. 

We have repeatedly written to you regarding your delivery details. We do not understand why you have not yet confirmed your shipping information so we can send it to your home address. 

Your special price for the brand new  iPhone 7 phone is only £3 with shipping. 

We hope that you'll confirm your information this time. 

我一直在使用这个函数,但似乎它抛出异常(“方法拆分没有重载匹配”):

let wordSplit (text:string) = 
  text.Split([|' ','\n','\t',',','.','/','\\','|',':',';'|])
  |> Array.toList

1 个答案:

答案 0 :(得分:5)

在F#中,数组或列表中的项由;(分号)字符分隔,而不是,(逗号)。您的代码正在创建一个包含一个 10项元组的数组。如果您想要一个包含十个项目的数组,则应编写以下内容:

let wordSplit (text:string) = 
  text.Split([|' ';'\n';'\t';',';'.';'/';'\\';'|';':';';'|])
  |> Array.toList

如果您还希望在分割操作中不返回空字符串,那么您需要version of String.Split that takes a StringSplitOptions parameter

let wordSplit (text:string) = 
  text.Split([|' ';'\n';'\t';',';'.';'/';'\\';'|';':';';'|], StringSplitOptions.RemoveEmptyEntries)
  |> Array.toList

请注意,StringSplitOptions位于System命名空间中,因此如果您的文件顶部没有open System行,则需要添加一行。< / p>