是否有办法使用CsvProvider
注入试图为格式错误的行生成有效值的代码? CsgProvider<IgnoreErrors=true>
将跳过任何格式错误的行,但数据会丢失。
对于特定示例,CsvProvider在IMDB电影标题的行679644上失效,这是一个非常干净的9列制表符分隔文件,其中包含有关检测到意外列的错误。在这种情况下,双引号似乎是问题,因为将它们更改为另一个字符允许成功解析该行。
type Movies = CsvProvider< @"c:\title.sample.tsv", MissingValues="\N", CacheRows=false,
Schema="string,string,string,string,boolean, int option, int option, int option, string",
IgnoreErrors=false >
let getMovies path =
use file = File.OpenRead(path)
use reader = new StreamReader(file)
let movies = Movies.Load(reader)
movies.Filter(fun r -> r.TitleType = "movie").Rows
|> Seq.toArray
getMovies @"c:\title.basics.20171218.tsv"
|> Seq.iter (fun t-> printfn "%A" t)
数据切片
tt0701219 tvEpisode Summer of 4'2" Summer of 4'2" 0 1996 \N 30 Animation,Comedy
tt0701220 tvEpisode Sunday, Cruddy Sunday Sunday, Cruddy Sunday 0 1999 \N 30 Animation,Comedy
tt0701221 tvEpisode Sweets and Sour Marge Sweets and Sour Marge 0 2002 \N 30 Animation,Comedy
tt0701222 tvEpisode Take My Wife, Sleaze Take My Wife, Sleaze 0 1999 \N 30 Animation,Comedy
tt0701223 tvEpisode Tennis the Menace Tennis the Menace 0 2001 \N 30 Animation,Comedy
tt0701224 tvEpisode Thank God It's Doomsday Thank God It's Doomsday 0 2005 \N 30 Animation,Comedy
我想确定的是CsvProvider中是否有一个钩子允许使用者注入代码以帮助解析从源文件中读出的行。