哈斯克尔:如何获得常用词

时间:2018-01-06 20:47:46

标签: haskell

我正在尝试删除常用词功能,但我不知道如何获取或找到常用词列表。我是否需要创建常用单词列表?谢谢

问题:

Takes a list of strings and drops any word that is within the top 20  
most commonly used in English. Returns a list of strings without those words.

结果如:

dropCommonWords ["the","planet","of","the","apes"]
["planet","apes"]

这里是我的Dropletters代码

dropletters xs = filter (\x -> x `elem` ['a'..'z'] ) xs

2 个答案:

答案 0 :(得分:2)

您需要一个常用单词列表,然后筛选那些不属于该列表元素的单词:

dropCommonWords xs = filter (\x -> x `notElem` common ) xs
  where common = ["the", "be", "to", "of", "and", "a", "in", "that", "have", "I", "it", "for", "not", "on", "with", "he", "as", "you", "do", "at"]

结果:

Prelude> dropCommonWords ["the","planet","of","the","apes"]
["planet","apes"]

答案 1 :(得分:0)

您也可以使用基本递归:

import Data.Char

dropCommonWords :: [String] -> [String]
dropCommonWords [] = []
dropCommonWords (x:xs)
    | map toLower x `notElem` commonWords = x : dropCommonWords xs
    | otherwise = dropCommonWords xs
    where commonWords = ["the", "be", "to", "of", "and", "a", "in", "that", "have", "I", "it", "for", "not", "on", "with", "he", "as", "you", "do", "at"]

这也可以事先用map toLower x将每个单词转换为小写,因为你可以获得"THE"之类的字符串,这可以算作一个常用字,只是一个不同的情况。

以上是上述代码的行为:

*Main> dropCommonWords ["the","planet","of","the","apes"]
["planet","apes"]
*Main> dropCommonWords ["THE","planet","of","the","apes"]
["planet","apes"]

注意:在这里使用filter更好,我没有发布,因为上面的答案已经提到过了。