haskell中简单的rss下载器

时间:2013-06-11 07:45:35

标签: haskell utf-8 rss

昨天我尝试在Network.HTTPFeed库的帮助下在Haskell中编写一个简单的rss下载程序。我想从rss项目下载链接,并在项目标题后面命名下载的文件。

这是我的简短代码:

import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)

getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)

downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
  file <- get link
  B.writeFile title file
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody

getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)

我到达了一个状态,我得到一个包含元组的列表,其中包含名称和相应的链接。我有一个downloadUri函数,可以将给定的链接正确下载到一个名为rss项目标题的文件。

我已经尝试修改downloadUri(Maybe String,Maybe String)fmap getwriteFile一起工作,但却失败了。

  • 如何将downloadUri函数应用于getTuples函数的结果。我想实现以下主要功能

    main :: IO ()
    main = some magic incantation donwloadUri more incantation getTuples

  • getItemTitle结果的字符编码被破坏,它将代码点放在重音字符的位置。 feed是utf8编码的,我认为所有haskell字符串操作函数都默认为utf8。我该如何解决这个问题?

修改

感谢您的帮助,我成功实现了我的主要和辅助功能。代码如下:

downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
  item <- get link
  B.writeFile title item
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"

getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)

downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"

main = getTuples >>= downloadAllItems

字符编码问题已部分解决,我在Feed解析之前放了decodeString,因此文件命名正确。但如果我想打印出来,问题仍然存在。最小的工作示例:

main = getTuples

2 个答案:

答案 0 :(得分:2)

听起来好像是Maybe给你带来麻烦。有许多方法可以处理Maybe值,以及一些有用的库函数,例如fromMaybefromJust。但是,最简单的方法是在Maybe值上进行模式匹配。我们可以调整你的downloadUri函数来处理Maybe值。这是一个例子:

downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
  file <- get link
  B.writeFile title file
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".

或许你可以让标题默认为空白,在这种情况下你可以在前一个例子的最后一行之前插入它:

downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)

现在你需要使用的唯一Maybe是外部的downloadAllItems (Just ts) = ??? -- hint: try a `mapM` downloadAllItems Nothing = ??? -- don't do anything, or report an error, or... ,应用于元组数组。再次,我们可以模式匹配。编写这样的辅助函数可能是最清楚的:

{{1}}

至于你的编码问题,我的猜测是:

  1. 您正在从非UTF-8编码的文件中读取信息,或者您的系统没有意识到它是UTF-8编码的。
  2. 您正在正确阅读信息,但在输出信息时会搞砸。
  3. 为了帮助您解决此问题,我需要查看完整的代码示例,其中显示了您如何阅读信息以及如何输出信息。

答案 1 :(得分:1)

您的主要内容可能如下所示。可能有一些更简洁的方法来组成这两个操作:

main :: IO ()
main = getTuples >>= process
       where
           process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst 
           process Nothing = return ()
           download (Just t, Just l) = downloadUri (t,l)
           download _ = return ()