我正在尝试删除常用词功能,但我不知道如何获取或找到常用词列表。我是否需要创建常用单词列表?谢谢
问题:
Takes a list of strings and drops any word that is within the top 20
most commonly used in English. Returns a list of strings without those words.
结果如:
dropCommonWords ["the","planet","of","the","apes"]
["planet","apes"]
这里是我的Dropletters代码
dropletters xs = filter (\x -> x `elem` ['a'..'z'] ) xs
答案 0 :(得分:2)
您需要一个常用单词列表,然后筛选那些不属于该列表元素的单词:
dropCommonWords xs = filter (\x -> x `notElem` common ) xs
where common = ["the", "be", "to", "of", "and", "a", "in", "that", "have", "I", "it", "for", "not", "on", "with", "he", "as", "you", "do", "at"]
结果:
Prelude> dropCommonWords ["the","planet","of","the","apes"]
["planet","apes"]
答案 1 :(得分:0)
您也可以使用基本递归:
import Data.Char
dropCommonWords :: [String] -> [String]
dropCommonWords [] = []
dropCommonWords (x:xs)
| map toLower x `notElem` commonWords = x : dropCommonWords xs
| otherwise = dropCommonWords xs
where commonWords = ["the", "be", "to", "of", "and", "a", "in", "that", "have", "I", "it", "for", "not", "on", "with", "he", "as", "you", "do", "at"]
这也可以事先用map toLower x
将每个单词转换为小写,因为你可以获得"THE"
之类的字符串,这可以算作一个常用字,只是一个不同的情况。
以上是上述代码的行为:
*Main> dropCommonWords ["the","planet","of","the","apes"]
["planet","apes"]
*Main> dropCommonWords ["THE","planet","of","the","apes"]
["planet","apes"]
注意:在这里使用filter
更好,我没有发布,因为上面的答案已经提到过了。