我正在尝试以.txt形式解析电子书,以了解更多有关attoparsec和Haskell(我是新手)的信息。在这种情况下,我试图计算给定文本文件中的句子数。这是我的代码:
{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import qualified Data.Text as T
import qualified Data.Text.IO as Txt
import Data.List
import Control.Applicative ((<*>), (*>), (<$>), (<|>), pure)
data Prose = Prose {
word :: [Char]
} deriving Show
optional :: Parser a -> Parser ()
optional p = option () (try p *> pure ())
specialChars = ['-', '_', '…', '“', '”', '\"', '\'', '’', '@', '#', '$',
'%', '^', '&', '*', '(', ')', '+', '=', '~', '`', '{', '}',
'[', ']', '/', ':', ';', ',']
inputSentence :: Parser Prose
inputSentence = Prose <$> many1' (letter <|> digit <|> space <|> satisfy (inClass specialChars))
sentenceSeparator :: Parser ()
sentenceSeparator = many1 (space <|> satisfy (inClass ".?!")) >> pure ()
sentenceParser :: String -> [Prose]
sentenceParser str = case parseOnly wp (T.pack str) of
Left err -> error err
Right x -> x
where
wp = optional sentenceSeparator *> inputSentence `sepBy1` sentenceSeparator
main :: IO()
main = do
input <- readFile "test.txt"
let sentences = sentenceParser input
print sentences
print $ length sentences
如果您想要全面了解我正在做的事情,请点击此link到github repo。 我的问题是当我尝试使用输入解析文本文件时:
我得到的输出如下:
所以我的问题是,我怎么能:
Daniel G. Brinton
这样的输入只是一句话。我尝试过使用isHorizontalSpace
,但无济于事。