一段时间以来,我一直在努力处理此管道代码,非常感谢您的帮助。就像这种代码由于类型检查器强制自然选择而通过随机突变而不断发展。这是我到目前为止最合适的候选人之一:
import Conduit
import qualified Data.Conduit.Combinators as DCC
import Data.CSV.Conduit
import Data.Function ((&))
import Data.List.Split (splitOn)
import Data.Map as DM
import Data.Text (Text)
import qualified Data.Text as Txt
import qualified Data.Text.IO as DTIO
import Data.Vector (Vector)
import qualified Data.Vector as DV
import Path
import System.FilePath.Posix
retrieveSmaXtec :: Path Abs Dir -> IO (Vector (MapRow Text))
retrieveSmaXtec sxDir = do
files <- sourceDirectoryDeep False (fromAbsDir sxDir) & return
fileVector <- return $ runConduit $ files .| sinkVector
csvRowsByFile <- runConduit ((yieldM fileVector) .| DCC.mapM processCSV .| sinkVector)
fNameRows <- readFnameData $ yieldM fileVector
(pairFill fNameRows csvRowsByFile)
& fmap (uncurry DM.union)
& return
where
fileList :: Path Abs Dir -> IO (Vector FilePath)
fileList dir = sourceDirectoryDeep False (fromAbsDir sxDir) .| sinkVector & runConduit
expandZip :: MapRow Text -> Vector (MapRow Text) -> Vector (MapRow Text, MapRow Text)
expandZip one many = zip (replicate mlen one) many
where
mlen = length many
pairFill :: Vector (MapRow Text) -> Vector (Vector (MapRow Text)) -> Vector (MapRow Text, MapRow Text)
pairFill ones manies = join $ fmap (uncurry expandZip) (zip ones manies)
processCSV :: FilePath -> IO (Vector (MapRow Text))
processCSV fp = sourceFile fp
.| intoCSV defCSVSettings
.| sinkVector
& runConduitRes
readFnameData :: (MonadThrow m, MonadResource m, PrimMonad m) => ConduitT () FilePath m () -> m (Vector (MapRow Text))
readFnameData files = runConduit $ files .| processFileName .| sinkVector
processFileName :: (MonadResource m, MonadThrow m, PrimMonad m) =>
ConduitT FilePath (MapRow Text) m ()
processFileName = mapC go
where
go :: FilePath -> MapRow Text
go fp = takeFileName fp
& takeWhile (/= '.')
& splitOn "_"
& fmap Txt.pack
& zip colNames
& DM.fromList
colNames = [markKey, idKey]
在下面两个错误中出现的当前混乱点是,[FilePath]
弹出时,我希望一切都只是FilePath
。现在,即使此问题已解决,我也不会怀疑会弹出其他错误,因此,如果有解决该问题的方法,需要进行一些返工,我很乐于尝试。
* Couldn't match type `Char' with `[Char]'
Expected type: ConduitM
[FilePath] Void IO (Vector (Vector (MapRow Text)))
Actual type: ConduitM
FilePath Void IO (Vector (Vector (MapRow Text)))
* In the second argument of `(.|)', namely
`DCC.mapM processCSV .| sinkVector'
In the first argument of `runConduit', namely
`((yieldM fileVector) .| DCC.mapM processCSV .| sinkVector)'
In a stmt of a 'do' block:
csvRowsByFile <- runConduit
((yieldM fileVector) .| DCC.mapM processCSV .| sinkVector)
|
40 | csvRowsByFile <- runConduit ((yieldM fileVector) .| DCC.mapM processCSV .| sinkVector)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Couldn't match type `[Char]' with `Char'
Expected type: ConduitT () FilePath IO ()
Actual type: ConduitT () [FilePath] IO ()
* In the second argument of `($)', namely `yieldM fileVector'
In a stmt of a 'do' block:
fNameRows <- readFnameData $ yieldM fileVector
In the expression:
do files <- sourceDirectoryDeep False (fromAbsDir sxDir) & return
fileVector <- return $ runConduit $ files .| sinkVector
csvRowsByFile <- runConduit
((yieldM fileVector) .| DCC.mapM processCSV .| sinkVector)
fNameRows <- readFnameData $ yieldM fileVector
....
|
41 | fNameRows <- readFnameData $ yieldM fileVector
| ^^^^^^^^^^^^^^^^^
这个问题以另一种形式在How to merge one-to-one and one-to-many input:output relationships in conduit?开始,但是现在我只是想让它正常工作, , anyhow 。
答案 0 :(得分:0)
睡了一会儿,花了更多时间后,我想出了一个解决方案。我仍然不太明白为什么我尝试过的某些方法不起作用,但是我对最终结果感到很满意(如果不是我到达那里所走的道路,但至少有时是learning is pain)。此处的主要区别在于,我决定重新使用sourceDirectoryDeep
管道(现在为files
),而不是尝试将其直接转换为向量。我还必须更加聪明地写processCSV
,但确实有一个错误的转弯,这仍然使我感到困惑(Why can one sometimes get "No instance for CSV Text Text arising from a use of `intoCSV`" when using csv-conduit?)。
retrieveSmaXtec :: Path Abs Dir -> IO (Vector SxRecord)
retrieveSmaXtec sxDir = do
csvRows <- getCsvRows
fnameRows <- getFileNameRows
rows <- return $ pairFill fnameRows csvRows & fmap (uncurry DM.union)
print rows
rows & fmap fromRow & catMaybes & return
where
getCsvRows :: IO (Vector (Vector (MapRow Text)))
getCsvRows = files .| processCSV & runConduitRes
getFileNameRows :: IO (Vector (MapRow Text))
getFileNameRows = files .| processFileName & runConduitRes
files :: MonadResource m => ConduitT () FilePath m ()
files = sourceDirectoryDeep False (fromAbsDir sxDir)
expandZip :: MapRow Text -> Vector (MapRow Text) -> Vector (MapRow Text, MapRow Text)
expandZip one many_ = zip (replicate mlen one) many_
where
mlen = length many_
pairFill :: Vector (MapRow Text) -> Vector (Vector (MapRow Text)) -> Vector (MapRow Text, MapRow Text)
pairFill ones manies = join $ fmap (uncurry expandZip) (zip ones manies)
processCSV :: (MonadResource m, MonadThrow m, PrimMonad m) =>
ConduitT FilePath Void m (Vector (Vector (MapRow Text)))
processCSV = mapMC (readCSVFile defCSVSettings) .| sinkVector
processFileName :: (MonadResource m, MonadThrow m, PrimMonad m) =>
ConduitT FilePath Void m (Vector (MapRow Text))
processFileName = mapC go
.| sinkVector
where
go :: FilePath -> MapRow Text
go fp = takeFileName fp
& takeWhile (/= '.')
& splitOn "_"
& fmap Txt.pack
& zip colNames
& DM.fromList
colNames = [markKey, idKey]