Question

我正在研究一个功能，该功能除其他任务外，还应该在熊猫中读取csv。作为参数之一，我想将分隔符作为字符串传递。但是，由于某种原因（可能与正则表达式有关），pandas完全忽略了我通过的解析器，默认为'\ t'，这不能正确解析我的数据。

import pandas as pd

def open_df(separator):
  df = pd.read_csv('filename.csv', sep=separator)
  return df

问题是，在这种情况下，我应该如何传递分隔符参数？

Answer 1

请检查此链接： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

sep：str，默认为“，”
Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can,
表示将使用后者并自动检测到分隔符由Python的内置嗅探器工具csv.Sniffer提供。此外，分隔符超过1个字符且与'\ s +'不同的字符将被解释作为正则表达式，也将强制使用Python 解析引擎。请注意，正则表达式定界符易于忽略引用数据。正则表达式示例：“ \ r \ t”。

Answer 2

我将分隔符字符串作为“原始”字符串传递了，这对我来说很好用。我使用的是原始字符串\被解释为普通字符，并且\ t可以正常工作

调用open_df()时，必须在字符串引号open_df(r"\t")之前写一个r

示例：

test_string = r"\t\n"
print(test_string)
\t\n

我还通过了“ python”作为引擎参数，以不显示解析器警告:-)。

熊猫忽略作为参数传递的分隔符

2 个答案: