如何使用attoparsec跳过不需要的文本

时间:2015-12-31 15:33:22

标签: parsing haskell attoparsec

我想解析与此类似的电子邮件

Random Info: ........
From: email@domain.com
Other Info: .......
Subject: ima subject
Some More Info: .....

This is a message.

但我不需要此电子邮件中的所有信息。只有“来自”,“主题”和消息本身。我该如何解析这样的消息?更具体地说,我如何跳过不需要的数据?

这是我到目前为止的代码

{-# LANGUAGE OverloadedStrings #-}

module MailParser where

import qualified Data.Text as T
import Data.Attoparsec.Text

type From    = Address
type Message = T.Text
type Subject = T.Text
type Local   = T.Text
type Domain  = T.Text

data Address = Address Local Domain
data Mail = Mail From Subject Message

addressParser :: Parser Address
addressParser = do
    _ <- string "From: "
    local  <- takeWhile1 (/= '@')
    _ <- char '@'
    domain <- takeWhile1 (/= '\n')
    return $ Address local domain

subjectParser :: Parser T.Text
subjectParser = do
    _ <- string "Subject: "
    takeWhile1 (/= '\n')

messageParser :: Parser Message
messageParser = do
    _ <- char '\n'
    takeText

mailParser :: Parser Mail
mailParser = do
    -- skip unwanted info
    from <- addressParser
    -- skip unwanted info
    subject <- subjectParser
    -- skip unwanted info
    message <- messageParser
    return $ Mail from subject message

1 个答案:

答案 0 :(得分:1)

不是用解析器忽略文本,另一种方法是正常解析电子邮件,然后只投影你感兴趣的标题。

-- After parsing, say we end up with this data.
data Email = Email { headers :: [Header], message :: Text }

data Header = Header { label :: Text, value :: Text }

-- Helper to find headers in an Email.
findHeader :: Text -> Email -> Maybe Header
findHeader x = find ((==x) . label) . headers

-- Project the Email's "Subject" header.
subject :: Email -> Maybe Header
subject = findHeader "Subject"

-- Project the Email's "From" header.
from :: Email -> Maybe Header
from = findHeader "From"

此外,hsemailhsemail-ns已定义了Parsec电子邮件解析器。