Question

我想在使用pandas创建的CSV文件中写一些评论。我在DataFrame.to_csv中找不到任何选项（即使read_csv可以跳过评论），但在标准csv模块中也没有。我可以打开文件，写下评论（以#开头的行），然后将其传递给to_csv。有没有更好的选择？

Answer 1

df.to_csv接受文件对象。因此，您可以在a模式下打开文件，写下注释并将其传递给数据帧to_csv函数。

例如：

In [36]: df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3]})

In [37]: f = open('foo', 'a')

In [38]: f.write('# My awesome comment\n')

In [39]: f.write('# Here is another one\n')

In [40]: df.to_csv(f)

In [41]: f.close()

In [42]: more foo
# My awesome comment
# Here is another one
,a,b
0,1,1
1,2,2
2,3,3

Answer 2

另一种方法@Vor的解决方案是首先将注释写入文件，然后使用mode='a'和to_csv()将数据框的内容添加到同一文件中。根据我的基准测试（下面），这只需要在附加模式下打开文件，添加注释然后将文件处理程序传递给pandas（根据@ Vor＆＃39的答案）。考虑到这是大熊猫在内部做的事情（DataFrame.to_csv()调用CSVFormatter.save()，通过_get_handles()使用open() to read in the file，类似的时间是有意义的。

另外，通过with语句可以方便地处理文件IO，确保打开的文件在您完成后关闭并保留with语句。请参阅下面的基准测试中的示例。

读入测试数据

import pandas as pd
# Read in the iris data frame from the seaborn GitHub location
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# Create a bigger data frame
while iris.shape[0] < 100000:
    iris = iris.append(iris)
# `iris.shape` is now (153600, 5)

1。附加相同的文件处理程序

%%timeit -n 5 -r 5

# Open a file in append mode to add the comment
# Then pass the file handle to pandas
with open('test1.csv', 'a') as f:
    f.write('# This is my comment\n')
    iris.to_csv(f)

972 ms ± 31.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

2。使用`to_csv(mode='a')`

重新打开文件

%%timeit -n 5 -r 5

# Open a file in write mode to add the comment
# Then close the file and reopen it with pandas in append mode
with open('test2.csv', 'w') as f:
    f.write('# This is my comment\n')
iris.to_csv('test2.csv', mode='a')

949 ms ± 19.3 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

使用pandas在CSV文件中写评论

2 个答案:

读入测试数据

1。附加相同的文件处理程序

2。使用`to_csv(mode='a')`

使用pandas在CSV文件中写评论

2 个答案:

读入测试数据

1。附加相同的文件处理程序

2。使用to_csv(mode='a')

2。使用`to_csv(mode='a')`