Question

所以我刚刚开始深入研究Pandas的世界，我发现的第一个奇怪的csv文件就是在开头就有两行注释（具有不同的列宽）。

skiprows

我知道如何使用header=或read_csv跳过这些行，而是在使用def make_pizza(size, *toppings): """Summarize the pizza we are about to make.""" print("\nMaking a " + str(size) + "-inch pizza with the following toppings:") for topping in toppings: print("- " + topping) make_pizza(16, 'pepperoni') # this line will execute during import because it is not within main block if __name__ == "__main__": make_pizza(12, 'mushrooms', 'green peppers', 'extra cheese') # this line is under main block hence it will only execute when you execute this file not on import时如何保留这些注释？有时候评论作为文件元信息是必要的，我不想把它们扔掉。

任何想法，伙计们？我将非常感谢我将收到的任何答案。

Answer 1

Pandas旨在读取结构化数据。

对于非结构化数据，只需使用内置的open：

with open('file.csv') as f:
    reader = csv.reader(f)
    row1 = next(reader)  # gets the first line
    row2 = next(reader)  # gets the second line

您可以将字符串附加到数据框，如下所示：

df.comments = 'My Comments'

But note：

但请注意，虽然您可以将属性附加到DataFrame，但在DataFrame上执行的操作（例如groupby，pivot，join或 loc只列举几个）可能会返回一个没有的新DataFrame 元数据附加。熊猫还没有一种强有力的方法传播附加到DataFrames的元数据。

Answer 2

您可以阅读首个元数据，然后使用read_csv：

with open('f.csv') as file:
    #read first 2 rows to metadata
    header = [file.readline() for x in range(2)]
    meta = [value.strip().split(',') for value in header]
    print (meta)
    [['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]

    df = pd.read_csv(file)
    print (df)

          *header*
    0  actual data

将csv的评论行保留在熊猫中？

2 个答案: