Question

我是Haskell和功能编程的新手。我有一个.txt文件，其中包含一些段落。我想用Haskell计算每个段落中的单词数。

我已经写了输入/输出代码

paragraph-words:: String -> int


no_of_words::IO()
no_of_words=
do
    putStrLn "enter the .txt file name:"
    fileName1<- getLine
    text<- readFile fileName1
            let wordscount= paragraph-words text

任何人都可以帮我写函数段 - 单词。这将计算每个段落中的单词数量。

Answer 1

首先：您不希望被肮脏的IO()所困扰，因此签名应该是

wordsPerParagraph :: String -> [Int]

至于这样做：你应该首先在段落中拆分文本。计算每个单词中的单词非常简单。

你基本上需要的是空行匹配（两个相邻的换行符）。所以我首先使用lines函数，给你一个行列表。然后在每个空行分开这些：

paragraphs :: String -> [String]
paragraphs = split . lines
 where split [] = []
       split (ln : "" : lns) = ln : split lns
       split (ln : lns) = let (hd, tl) = splitAt 1 $ split lns
                          in (ln ++ hd) : tl

Answer 2

首先，你不能在函数名中使用-，你必须使用_代替（或者更好，使用camelCase作为leftroundabout建议如下）。

这是一个满足您的类型签名的函数：

paragraph_words = length . words

首先将文本拆分为单词列表，然后通过返回单词列表的长度来计算它们。

但是，这并没有完全解决问题，因为您没有编写代码将文本拆分成段落。

Answer 3

如果一条线占据所有线，直到至少有一条空行（""）或列表耗尽（1），则可以将行列表拆分为段落。我们忽略所有连续的空行（2）并对我们的其余行应用相同的方法：

type Line      = String
type Paragraph = [String]

parify :: [Line] -> [Paragraph]
parify [] = []
parify ls 
  | null first = parify rest          
  | otherwise  = first : parify rest
  where first = takeWhile (/= "") ls  -- (1) take until newline or end
        rest  = dropWhile (== "") . drop (length first) $ ls
             -- ^ (2) drop all empty lines

为了将字符串拆分为其行，您只需使用lines即可。要获得Paragraph中的字数，您只需将每行中的字数相加

singleParagraphCount :: Paragraph -> Int
singleParagraphCount = sum . map lineWordCount

每行中的字词只是length . words：

lineWordCount :: Line -> Int
lineWordCount = length . words

总而言之，我们得到以下功能：

wordsPerParagraph :: String -> [Int]
wordsPerParagraph = map (singleParagraphCount) . parify . lines

使用haskell识别段落中的单词数量

3 个答案: