Haskell读取CSV文件 - >从url加载XML文件 - >再次写出CSV文件

时间:2016-06-16 07:04:52

标签: xml csv haskell hxt

我正在尝试

  1. 加载CSV文件
  2. 从文件中读取ID
  3. 为每个Id
  4. 加载外部xml文件
  5. 从XML
  6. 中读取一些名称
  7. 将ID和名称写入新的CSV文件
  8. 我是Haskell的新手并且非常想学习它,我仍处于理解的复制和粘贴阶段。我已经为每个部分找到了自己的教程,但我很难将它们组合起来。

    CSV非常简单,例如:

    736572,"Mount Athos"
    6697806,"North Aegean"
    

    我使用Cassava来阅读CSV和HandsomeSoup以便阅读XML。

    这里我尝试读取id,加载xml并至少从xml打印名称。

    {-# LANGUAGE ScopedTypeVariables #-}
    
    import qualified Data.ByteString.Lazy as BL
    import Data.Csv
    import qualified Data.Vector as V
    
    import Text.XML.HXT.Core
    import Text.HandsomeSoup
    
    import Data.List
    import Data.Char
    
    
    getPlaceNames::String->String->String
    getPlaceNames pid name = do
        let doc = fromUrl ("http://api.geonames.org/get?geonameId="++pid++"&username=demo")
    
        c<-runX $ doc >>> css "alternateNames" >>> deep getText
        return (head c)
    
    
    main :: IO ()
    main = do
        csvData <- BL.readFile "input.csv"
        case decode NoHeader csvData of
            Left err -> putStrLn err
            Right v -> V.forM_ v $ \ ( pid, name ) ->
              putStrLn $  getPlaceNames pid name
    

    当我调用getPlaceNames并返回名称时,我认为我做错了。我甚至不确定我是否应该使用&#39; do&#39; getPlaceNames中的语句。

    错误说

     Couldn't match expected type ‘[[Char]]’
                with actual type ‘IO [String]’
    In a stmt of a 'do' block:
      c <- runX $ doc >>> css "alternateNames" >>> deep getText
    In the expression:
      do { let doc
                 = fromUrl
                     ("http://api.geonames.org/get?geonameId="
                      ++ pid ++ "&username=demo");
           c <- runX $ doc >>> css "alternateNames" >>> deep getText;
           return (head c) }
    In an equation for ‘getPlaceNames’:
        getPlaceNames pid name
          = do { let doc = ...;
                 c <- runX $ doc >>> css "alternateNames" >>> deep getText;
                 return (head c) }
    

    但这可能只是我做错了一件事,因为我对monad和绑定缺乏了解。

    任何帮助都会受到赞赏,即使它只是指向正确文档的指针。

    干杯

    比约

1 个答案:

答案 0 :(得分:1)

感谢chi,我已经找到了整个过程。我正在为需要做类似事情的其他人发布我的代码。

最后,我不仅从xml中获取了多个字段的名称。 所以我将getPlaceNames更改为gtPlaceDetails

我展示了完整的代码,因为它还展示了我如何从XML中读取不同的字段以及如何将XML中的alternateName元素合并为一个字符串。

{-# LANGUAGE ScopedTypeVariables #-}


import qualified Data.ByteString.Lazy.Char8 as BL


import Data.Csv
import qualified Data.Vector as V

import Text.XML.HXT.Core
import Text.HandsomeSoup
import Data.List
import Data.Char


uppercase :: String -> String
uppercase = map toUpper


toLanguageStr :: (String, String) -> String
toLanguageStr (lan,name) = uppercase lan ++ ":" ++ name


getPlaceDetails::String->String->IO (Int,String,Float,Float,Float,Float,Float,Float,String,String)
getPlaceDetails pid name = do
    let doc = fromUrl ("http://api.geonames.org/get?geonameId="++pid++"&username=demo")

    id<-runX $ doc >>> css "geonameId" >>> deep getText
    name<-runX $ doc >>> css "name" >>> deep getText
    s<- runX $ doc >>> css "south" >>> deep getText
    w<- runX $ doc >>> css "west" >>> deep getText
    n<- runX $ doc >>> css "north" >>>  deep getText
    e<- runX $ doc >>> css "east" >>> deep getText
    lat<- runX $ doc >>> css "lat" >>> deep getText
    lng<- runX $ doc >>> css "lng" >>> deep getText
    translations<- runX $ doc >>> css "alternateName" >>> (getAttrValue "lang" &&& (deep getText))
    terms<- runX $ doc >>> css "alternateNames" >>> deep getText
    return ( read (head id),head name, read (head lat), read (head lng), read (head s), read (head w), read (head n), read (head e), intercalate "|" $ map toLanguageStr translations, head terms )



main :: IO ()
main = do
    csvData <- BL.readFile "input.csv"
    case decode NoHeader csvData of
        Left err -> putStrLn err
        Right v -> V.forM_ v $ \ ( pid, name )->do
            details <- getPlaceDetails pid name
            BL.appendFile "out.csv" $ encode [details]
            BL.putStrLn  (encode [details]) 

例如input.csv行

736572,"Mount Athos"

映射到out.csv这个

736572,"Mount Athos",40.15798,24.33021,40.11294,23.99234,40.4563,24.40044,"KO:아토스 산|:Aftónomos Periochí Agíou Órous|:Ágion Óros|:Ágio Óros|:Athos|NO:Áthos|EN:Autonomous Monastic State of the Holy Mountain|:Avtonómos Periokhí Ayíou Órous|:Áyion Óros|:Dhioíkisis Ayíou Órous|:Hagion Oros|:Holy Athonite Republic|LINK:http://en.wikipedia.org/wiki/Mount_Athos|CA:Mont Athos|FR:Mont Athos|EN:Mount Athos|FR:République monastique du Mont Athos|EL:Αυτόνομη Μοναστική Πολιτεία Αγίου Όρους","Aftonomos Periochi Agiou Orous,Aftónomos Periochí Agíou Órous,Agio Oros,Agion Oros,Athos,Autonome Monastike Politeia Agiou Orous,Autonomous Monastic State of the Holy Mountain,Avtonomos Periokhi Ayiou Orous,Avtonómos Periokhí Ayíou Órous,Ayion Oros,Dhioikisis Ayiou Orous,Dhioíkisis Ayíou Órous,Hagion Oros,Holy Athonite Republic,Mont Athos,Mount Athos,Republique monastique du Mont Athos,République monastique du Mont Athos,atoseu san,Ágio Óros,Ágion Óros,Áthos,Áyion Óros,Αυτόνομη Μοναστική Πολιτεία Αγίου Όρους,아토스 산"