我想解析与此类似的电子邮件
Random Info: ........
From: email@domain.com
Other Info: .......
Subject: ima subject
Some More Info: .....
This is a message.
但我不需要此电子邮件中的所有信息。只有“来自”,“主题”和消息本身。我该如何解析这样的消息?更具体地说,我如何跳过不需要的数据?
这是我到目前为止的代码
{-# LANGUAGE OverloadedStrings #-}
module MailParser where
import qualified Data.Text as T
import Data.Attoparsec.Text
type From = Address
type Message = T.Text
type Subject = T.Text
type Local = T.Text
type Domain = T.Text
data Address = Address Local Domain
data Mail = Mail From Subject Message
addressParser :: Parser Address
addressParser = do
_ <- string "From: "
local <- takeWhile1 (/= '@')
_ <- char '@'
domain <- takeWhile1 (/= '\n')
return $ Address local domain
subjectParser :: Parser T.Text
subjectParser = do
_ <- string "Subject: "
takeWhile1 (/= '\n')
messageParser :: Parser Message
messageParser = do
_ <- char '\n'
takeText
mailParser :: Parser Mail
mailParser = do
-- skip unwanted info
from <- addressParser
-- skip unwanted info
subject <- subjectParser
-- skip unwanted info
message <- messageParser
return $ Mail from subject message
答案 0 :(得分:1)
不是用解析器忽略文本,另一种方法是正常解析电子邮件,然后只投影你感兴趣的标题。
-- After parsing, say we end up with this data.
data Email = Email { headers :: [Header], message :: Text }
data Header = Header { label :: Text, value :: Text }
-- Helper to find headers in an Email.
findHeader :: Text -> Email -> Maybe Header
findHeader x = find ((==x) . label) . headers
-- Project the Email's "Subject" header.
subject :: Email -> Maybe Header
subject = findHeader "Subject"
-- Project the Email's "From" header.
from :: Email -> Maybe Header
from = findHeader "From"
此外,hsemail和hsemail-ns已定义了Parsec电子邮件解析器。