NimbleCSV:长生不老药

时间:2020-07-04 10:27:21

标签: csv erlang elixir

我正在尝试将 NimbleCSV 库用于个人项目,但遇到一些问题...

NimbleCSV.define(MyParser, separator: ",", escape: "\"")

defmodule Siren do
  def parseCSV do
    IO.puts("Let's parse CSV file!")
    File.stream!("name.csv")
  |> MyParser.parse_stream
  |> Stream.map(fn [name, team, position, height, weight, age] ->
    %{name: name, team: team, position: position, height: String.to_integer(height), weight: String.to_integer(weight), age: String.to_integer(age)}
    end)
  |> Enum.map(&IO.puts(&1))
  end
end

就像您在上面看到的那样,我正在使用Stream,但是当我启动Mix任务时,它崩溃了:

➜  siren mix siren
Compiling 1 file (.ex)
Let's parse CSV file!
** (NimbleCSV.ParseError) unexpected escape character " in " \"Team\", \"Position\", \"Height(inches)\", \"Weight(lbs)\", \"Age\"\n"
    deps/nimble_csv/lib/nimble_csv.ex:427: MyParser.separator/5
    deps/nimble_csv/lib/nimble_csv.ex:360: anonymous fn/4 in MyParser.parse_stream/2
    (elixir 1.10.3) lib/stream.ex:902: Stream.do_transform_user/6
    (elixir 1.10.3) lib/stream.ex:1609: Enumerable.Stream.do_each/4
    (elixir 1.10.3) lib/enum.ex:3383: Enum.map/2
    (mix 1.10.3) lib/mix/task.ex:330: Mix.Task.run_task/3
    (mix 1.10.3) lib/mix/cli.ex:82: Mix.CLI.run_task/2

这是我的CSV文件:

"Name", "Team", "Position", "Height(inches)", "Weight(lbs)", "Age"
"Adam Donachie", "BAL", "Catcher", 74, 180, 22.99
"Paul Bako", "BAL", "Catcher", 74, 215, 34.69
"Ramon Hernandez", "BAL", "Catcher", 72, 210, 30.78
"Kevin Millar", "BAL", "First Baseman", 72, 210, 35.43
"Chris Gomez", "BAL", "First Baseman", 73, 188, 35.71
"Brian Roberts", "BAL", "Second Baseman", 69, 176, 29.39
"Miguel Tejada", "BAL", "Shortstop", 69, 209, 30.77
"Melvin Mora", "BAL", "Third Baseman", 71, 200, 35.07
"Aubrey Huff", "BAL", "Third Baseman", 76, 231, 30.19
"Adam Stern", "BAL", "Outfielder", 71, 180, 27.05
"Jeff Fiorentino", "BAL", "Outfielder", 73, 188, 23.88
"Freddie Bynum", "BAL", "Outfielder", 73, 180, 26.96
"Nick Markakis", "BAL", "Outfielder", 74, 185, 23.29
"Brandon Fahey", "BAL", "Outfielder", 74, 160, 26.11
"Corey Patterson", "BAL", "Outfielder", 69, 180, 27.55

问题一定来自我之前定义的转义字符,但我不明白为什么?这里的转义字符是什么?对我来说,是CSV行中每个字符串的双引号。

1 个答案:

答案 0 :(得分:2)

CSV 表示逗号分隔值,该格式具有自己的RFC4180。人们无法随时随地放置空格。将输入更改为以下所示,一切正常。问题是逗号后有空格,或者换句话说,转义符不能紧跟定界符。

"Name","Team","Position","Height(inches)","Weight(lbs)","Age"
"Adam Donachie","BAL","Catcher",74,180,22.99
"Paul Bako","BAL","Catcher",74,215,34.69
"Ramon Hernandez","BAL","Catcher",72,210,30.78
"Kevin Millar","BAL","First Baseman",72,210,35.43
"Chris Gomez","BAL","First Baseman",73,188,35.71
"Brian Roberts","BAL","Second Baseman",69,176,29.39
"Miguel Tejada","BAL","Shortstop",69,209,30.77
"Melvin Mora","BAL","Third Baseman",71,200,35.07
"Aubrey Huff","BAL","Third Baseman",76,231,30.19
"Adam Stern","BAL","Outfielder",71,180,27.05
"Jeff Fiorentino","BAL","Outfielder",73,188,23.88
"Freddie Bynum","BAL","Outfielder",73,180,26.96
"Nick Markakis","BAL","Outfielder",74,185,23.29
"Brandon Fahey","BAL","Outfielder",74,160,26.11
"Corey Patterson","BAL","Outfielder",69,180,27.55

NimbleCSV 带有默认实现NimbleCSV.RFC4180 正是您所使用的,因此您无需定义自己的解析器,请使用默认解析器。

defmodule Siren do
  def parseCSV do
    IO.puts("Let's parse CSV file!")

    File.stream!("name.csv")
    |> NimbleCSV.RFC4180.parse_stream()
    |> Stream.map(fn [name, team, position, height, weight, age] ->
      %{name: name, team: team, position: position,
        height: String.to_integer(height),
        weight: String.to_integer(weight),
        age: String.to_float(age) # NOTE float here!
      }
    end)
    |> Enum.to_list()
    |> IO.inspect()
  end
end
#⇒ [
#  %{
#    age: 22.99,
#    height: 74,
#    name: "Adam Donachie",
#    position: "Catcher",
#    team: "BAL",
#    weight: 180
#  },
#  ...
# ]