重新排列包含可变长度重复模式的字符串

时间:2010-08-03 03:08:06

标签: parsing list haskell

我有一个包含以下布局的文件:

  

TABLE name_of_table

     

COLUMNS first_column 2nd_column [..] n-th_column

     

VALUES 1st_value 2nd_value [...]第n个值

     

VALUES yet_another_value ...继续

     

另一个表重复开始......

我想让这个文本文件重新排列,所以我不必在每个VALUES行前面输入TABLE和COLUMNS,产生:

  

TABLE name_of_table COLUMNS first_column [..]第n列VALUES 1st_value

     

TABLE name_of_table COLUMNS first_column [..]第n列VALUES yetanother_value

我需要在此输入输入并重新排列多行,因此将整个文本文件作为带有 hGetContents 的字符串似乎是合适的,产生如下字符串:

  

TABLE name_of_table COLUMNS first_column [..] n-th_column VALUES 1st_value [..] n-th_value VALUES another_value [..] yet_another VALUES ......另一个表......栏目......价值[ ......] VALUES ......

我尝试过使用嵌套的递归和递归的情况。这给了我一个我需要帮助的困境:

1)我需要递归以避免无休止的案例嵌套问题。

2)使用递归,我不能作为替代添加字符串的前面部分,因为递归只引用我的字符串的尾部!

说明问题:

myStr::[[Char]]->[[Char]] myStr [] = [] myStr one = case (head one) of "table" -> "insert into":(head two):columnRecursion (three) ++ case (head four) of "values" -> (head four):valueRecursion (tail three) ++ myStr (tail four) _ -> case head (tail four) of "values" -> (head (tail four):myStr (tail (tail four)) _ -> where two = tail one three = tail two four = tail three columnRecursion::[[Char]] -> [[Char]] columnRecursion [] = [] columnRecursion cool = case (head cool) of "columns" -> "(":columnRecursion (tail cool) "values" -> [")"] _ -> (head cool):columnRecursion (tail cool) valueRecursion::[[Char]] -> [[Char]] valueRecursion foo = case head foo of "values" -> "insert into":(head two):columnRecursion (three) ++ valueRecursion (tail foo) "table" -> [] "columns"-> [] _ -> (head foo):valueRecursion (tail foo)

我结束了FIRSTPART,值得再次获取FIRSTPART,以创建FIRSTPART,VALUES,FIRSTPART,VALUES,FIRSTPART,VALUES。

通过在valueRecursion中引用myStr来尝试这样做显然超出了范围。

怎么做??

2 个答案:

答案 0 :(得分:2)

对我来说,这种问题只会超过使用实际解析工具阈值。以下是Attoparsec的快速工作示例:

import Control.Applicative
import Data.Attoparsec (maybeResult)
import Data.Attoparsec.Char8
import qualified Data.Attoparsec.Char8 as A (takeWhile)
import qualified Data.ByteString.Char8 as B
import Data.Maybe (fromMaybe)

data Entry = Entry String [String] [[String]] deriving (Show)

entry = Entry <$> table <*> cols <*> many1 vals
items = sepBy1 (A.takeWhile $ notInClass " \n") $ char ' '
table = string (B.pack "TABLE ") *> many1 (notChar '\n') <* endOfLine
cols = string (B.pack "COLUMNS ") *> (map B.unpack <$> items) <* endOfLine
vals = string (B.pack "VALUES ")  *> (map B.unpack <$> items) <* endOfLine

parseEntries :: B.ByteString -> Maybe [Entry]
parseEntries = maybeResult . flip feed B.empty . parse (sepBy1 entry skipSpace)

还有一点机器:

pretty :: Entry -> String
pretty (Entry t cs vs)
  = unwords $ ["TABLE", t, "COLUMNS"]
  ++ cs ++ concatMap ("VALUES" :) vs

layout :: B.ByteString -> Maybe String
layout = (unlines . map pretty <$>) . parseEntries

testLayout :: FilePath -> IO ()
testLayout f = putStr . fromMaybe [] =<< layout <$> B.readFile f

鉴于此输入:

TABLE test
COLUMNS a b c
VALUES 1 2 3
VALUES 4 5 6

TABLE another
COLUMNS x y z q
VALUES 7 8 9 10
VALUES 1 2 3 4

我们得到以下信息:

*Main> testLayout "test.dat" 
TABLE test COLUMNS a b c VALUES 1 2 3 VALUES 4 5 6
TABLE another COLUMNS x y z q VALUES 7 8 9 10 VALUES 1 2 3 4

这似乎是你想要的?

答案 1 :(得分:0)

此答案为literate Haskell,因此您可以将其复制并粘贴到名为table.lhs的文件中,以获得正常工作的程序。

从少量导入开始

> import Control.Arrow ((&&&))
> import Control.Monad (forM_)
> import Data.List (intercalate,isPrefixOf)
> import Data.Maybe (fromJust)

并说我们代表一张包含以下记录的表:

> data Table = Table { tblName :: String
>                    , tblCols :: [String]
>                    , tblVals :: [String]
>                    }
>   deriving (Show)

也就是说,我们记录表的名称,列名列表和列值列表。

输入中的每个表都以TABLE开头的行开头,因此将输入中的所有行相应地分成块:

> tables :: [String] -> [Table]
> tables [] = []
> tables xs = next : tables ys
>   where next = mkTable (th:tt)
>         (th:rest) = dropWhile (not . isTable) xs
>         (tt,ys) = break isTable rest
>         isTable = ("TABLE" `isPrefixOf`)

将输入组合成表格后,给定表格的名称是TABLE行上的第一个单词。列名称是COLUMNS行上显示的所有字词,列值来自VALUES行:

> mkTable :: [String] -> Table
> mkTable xs = Table name cols vals
>   where name = head $ fromJust $ lookup "TABLE" tagged
>         cols = grab "COLUMNS"
>         vals = grab "VALUES"
>         grab t = concatMap snd $ filter ((== t) . fst) tagged
>         tagged = map ((head &&& tail) . words)
>                $ filter (not . null) xs

给定Table记录,我们通过在一行上以适当的顺序粘贴名称,值和SQL关键字来打印它:

> main :: IO ()
> main = do
>   input <- readFile "input"
>   forM_ (tables $ lines input) $
>     \t -> do putStrLn $ intercalate " " $
>                 "TABLE"   : (tblName t)  :
>                ("COLUMNS" : (tblCols t)) ++
>                ("VALUES"  : (tblVals t))

鉴于缺乏想象力的输入

TABLE name_of_table

COLUMNS first_column 2nd_column [..] n-th_column

VALUES 1st_value 2nd_value [...] n-th value

VALUES yet_another_value ... go on

TABLE name_of_table

COLUMNS first_column 2nd_column [..] n-th_column

VALUES 1st_value 2nd_value [...] n-th value

VALUES yet_another_value ... go on

输出

$ runhaskell table.lhs
TABLE name_of_table COLUMNS first_column 2nd_column [..] n-th_column VALUES 1st_value 2nd_value [...] n-th value yet_another_value ... go on
TABLE name_of_table COLUMNS first_column 2nd_column [..] n-th_column VALUES 1st_value 2nd_value [...] n-th value yet_another_value ... go on