我有一个csv文件,它有近10000000行,结构如下:
date , code , ret
2001-01-01,000001,0.1
2001-01-01,000002,0.01
2001-01-02,000001,0.05
2001-01-02,000002,0.02
字段“date”和“code”只是一个键。我希望快速对文件进行子集化,例如
subset(code='000001')
date , code , ret
2001-01-01,000001,0.1
2001-01-02,000001,0.05
或
subset(date='2001-01-01')
date , code , ret
2001-01-01,000001,0.1
2001-01-01,000002,0.01
如何选择正确的数据结构以使其有效运作?
答案 0 :(得分:1)
查看F#Data项目中的CSVTypeProvider:
https://fsharp.github.io/FSharp.Data/library/CsvProvider.html
您可以将此作为基础数据结构,轻松地将数据解析为更优化的数据结构,以便快速访问,如@MarcinJuraszek所述。