如何使用PyArrow的`read_csv`读取不带标头分隔符的CSV?

时间:2019-08-24 08:10:24

标签: python csv pyarrow apache-arrow

我有一个看起来像

的文件
2|1|abc
3|4|def
from pyarrow import csv

a = csv.read_csv("file.csv", parse_options=csv.ParseOptions(delimiter="|", header_rows=0))

那我该如何指定显式列名?在文档中找不到它。

Traceback (most recent call last):
  File "C:\data\dask\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-15-18e80408b284>", line 2, in <module>
    a = csv.read_csv("c:/data/Performance_All/Performance_2003Q3.txt", parse_options=csv.ParseOptions(delimiter="|", header_rows=0))
  File "pyarrow\_csv.pyx", line 450, in pyarrow._csv.read_csv
  File "pyarrow\error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: header_rows == 0 needs explicit column names

2 个答案:

答案 0 :(得分:3)

请参见https://issues.apache.org/jira/browse/ARROW-6231。我们正在讨论列名称的自动分配-您的反馈将非常有用。同时,您必须传递明确的列名。

答案 1 :(得分:2)

https://issues.apache.org/jira/browse/ARROW-5747中添加了column_names参数,该参数将包含在0.15版本中。