Question

我正在寻找一种将CSV文件读入pandas DataFrame的方法，我需要跳过输入文件中的某一行，该行是在未知数量的注释行之后出现的---在我意义上的未知我希望我的代码可以处理多个数据文件，并且这些注释行的数量在所有文件中都不相同。 CSV数据文件如下所示：

#
# unknown
# number of comment
# rows
#
columnname1 columnname2 columnname3
line containing stuff I want to ignore (does NOT start with a comment char)
1.2 3.4 5.6
2.3 4.5 6.7
3.4 5.6 7.8
...

我的第一个想法是做

pd.read_csv(filename, comment="#", skiprows=[1])

表示要跳过第二行，但事实证明read_csv()在分配行号时会对注释行进行计数，因此skiprows=[1]参数实际上会导致read_csv()跳过第二条注释文件中的行，而不是第二个非注释行。

我目前正在使用函数

来解决这个问题

def read_csv(file, *args, **kwargs):
    comment = kwargs["comment"] if "comment" in kwargs else False
    lines = "".join([line for line in open(file)
                     if not comment or not line.startswith(comment)])
    return pd.read_csv(StringIO.StringIO(lines), *args, **kwargs)

工作正常，但我想知道是否有更直接的方法，例如read_csv()的选项组合会导致它在计算行号时不计算注释行，和/或过滤read_csv()看到的不涉及将整个文件加载到内存中的数据的方法。

建议，有人吗？

忽略python pandas中的注释行的行数read_csv

0 个答案: