Question

我正在交互式ghci会话中（在jupyter笔记本中）浏览csv文件：

import Text.CSV
import Data.List
import Data.Maybe

dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat

-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index

indexRow records 1
-- this works! 

-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records

如果我提前知道第3列的类型，这是可行的，

map (\x -> read x :: Double) $ indexField records 3

当我的列可能包含字符串或num时，如何请求read推断类型是什么？我想尝试一下，但是：

map read $ indexField records 3

失败

Prelude.read: no parse

我不在乎它们是字符串还是num，我只需要它们都是一样的，而我却至少没有找到一种至少使用read函数指定它的方法。

很奇怪，如果我这样定义均值函数：

mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))

这有效：

mean $ map read $ indexField records 2
Just 13.501359655240003

但是，这并不意味着失败：

map read $ indexField records 2
Prelude.read: no parse

Answer 1

不幸的是，read在遇到这种情况时已经走到了尽头。让我们回顾一下read：

read :: Read a => String -> a

如您所见，a并不取决于输入，而仅取决于输出，因此也取决于我们函数的上下文。如果使用read a + read b，则由于Num规则，附加的Integer上下文将类型限制为Double或default。让我们看看它的作用：

> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a

好的，a仍然没有帮助。没有其他上下文，我们可以阅读任何类型的内容吗？当然，单位：

> read "()"
()
it :: Read a => a

这仍然没有任何帮助，因此我们启用monomorphism restriction：

> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer

啊哈。最后，我们有一个Integer。由于+，我们必须确定类型。现在，启用MonomorphismRestriction后，read "1234"在没有附加上下文的情况下会发生什么？

> read "1234"
<interactive>:20:1
   No instance for (Read a0) arising from a use of 'read'
   The type variable 'a0' is ambiguous

现在GHCi不选择任何（默认）类型，并强制您选择一个。这使得潜在的错误更加清晰。

那么我们该如何解决呢？由于CSV在运行时可以包含任意字段，并且所有类型都是静态确定的，因此我们必须通过引入类似内容来作弊

data CSVField = CSVString String | CSVNumber Double | CSVUnknown

然后写

parse :: Field -> CSVField

毕竟，我们的类型需要覆盖所有个可能的字段。

但是，在您的情况下，我们只能限制read's类型：

myRead :: String -> Double
myRead = read

但这不是明智的选择，因为如果该列不包含开头的Double，我们仍然会以错误结尾。因此，让我们使用readMaybe和mapM：

columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe

那样，类型是固定的，我们被迫检查是否有Just东西或Nothing：

mean <$> columnAsNumbers (indexFields records 2)

但是，如果您发现自己经常使用columnAsNumbers，请创建一个运算符：

(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index

在Haskell CSV中获取列并推断列类型

1 个答案: