Question

我正在努力阅读大型数据集。我认为主要问题是我不知道如何正确处理列表。我想做的是：

获取原始数据。
删除原始数据中的空列表。
阅读[[String]]中存在的功能：[####，A / B，＃，＃，＃，＃]
将结果输入[Double，[Double]]：[1/0，＃，＃，＃，＃]

到目前为止，我有两个函数，下一步是实现调用它们的第三个函数。进一步的目标是解析数据集，给我一个双打列表列表 - 每个实例都有一个类标签和一个特征向量。

然而，暂时我会很乐意用一个＆＃34;虚拟＆＃34;串。宝贝的步骤。

我已经实现的功能和单独工作的功能是＆＃34; numericLabel＆＃34;和＆＃34; numericFeatures＆＃34;。

原始数据示例([String]) = [32142, "B", 1,2,0.4,3,2] 含义：（ID＃，类别标签，功能）

numericLabel :: [Char] -> Int 
numericLabel x = (if(isBenign x)
                    then 1
                    else if(isMalign x)
                    then -1
                    else 0)



numericFeatures :: [String] -> [Double]
numericFeatures [] = []
numericFeatures (x:xs) = (read x :: Double) : numericFeatures xs

这两个功能通过调用numericLabel＆＃34; B或M＆＃34;和numericFeatures [1,2,0.4,3,2]。第一个返回Double，第二个返回[Double]。两者都读取字符串。

我很难实现的功能是一个整个[String]，包括冗余＆＃34; ID＆＃34;，转换为Double的char，以及转换为的[String] [双]。

目标：呼叫processItem [32142, "B", 1,2,0.4,3,2]然后依次为项目的不同部分调用上述功能returning: (Double,[Double]) (for this example: (1,[1.0 2.0 0.4 3.0 2.0]) )

PARSING：正如benjic所提到的，如果我知道自己在做什么，解析可能就是我想要的解决方案。但是在我的技能范围内尝试利用解析示例几乎是不可能的。

注意：下面是我认为是不必要的信息，我相信所有这些都在这个＆＃34; EDIT＆＃34;中说明。但是为了可能的需要，我把它留在那里。

import Text.CSV

type Label = Double
type Feats = [Double]

getRawData' :: String -> IO [[String]]
getRawData' fn = do
    s <- readFile fn 
    return $ parseCSVsimple s 
-- Return: Wraps given value in an IO action. 

getRawData :: String -> IO [[String]]
getRawData fn = do
    d <- getRawData' fn
    return (dropEmpty d)

dropEmpty :: [[String]] -> [[String]]
dropEmpty d = filter(not . null) d --filter(not.null) d

isBenign :: [Char] -> Bool
isBenign x = x == "M" 
isMalign :: [Char] -> Bool
isMalign x = x == "B" 

isChar :: [Char] -> Bool
isChar = (`elem` ['A'..'Z'])

numericLabel :: [Char] -> Int 
numericLabel x = (if(isBenign x)
                then 1
                else if(isMalign x)
                then -1
                else 0)

numericFeatures :: [String] -> [Double]
numericFeatures [] = []
numericFeatures (x:xs) = (read x :: Double) : numericFeatures xs


This below here is just garbage, and what I need help with. 
--processItem :: [String] -> (Label,Feats)
--processItem st takeWhile (not . isChar)  
--processItem   Label = numericLabel filter (isChar) x:xs 
--processItem   Feats = numericFeatures filter (not . isChar) x:xs

哈斯克尔;从[[String]]到Class＆amp;特征向量（双，[双]）

0 个答案: