在Windows中,标准EOL(行尾)终止符是回车符,后跟换行符。在数据帧上使用to_csv方法时,这就是我得到的。但是,当我使用to_csv方法编写一个gzip压缩文件时,在该文件中得到了两个回车符。
1
以下是输出:
import pandas as pd, sys, gzip, zlib
print("python:", sys.version)
print("pandas:", pd.__version__)
print("zlib :", zlib.ZLIB_RUNTIME_VERSION)
df=pd.DataFrame(data={'c0':['a','b'], 'c1':['c','d']})
print(df)
# Under Windows the EOL marker is \r\n, so this works as expected
df.to_csv('df.csv', index=None)
with open('df.csv', 'rb') as f:
print("df.csv, default terminator :", f.read())
# with gzip it writes \r\r\n as EOL, looks like a bug
df.to_csv('df.csv.gz', index=None)
with gzip.open('df.csv.gz', 'rb') as f:
print("df.csv.gz, default terminator:", f.read())
# when specifying only a single '\n' that's what is written
df.to_csv('df.csv', index=None, line_terminator='\n')
with open('df.csv', 'rb') as f:
print("df.csv, '\\n' terminator :", f.read())
# when specifying only a single '\n' gzip it writes \r\n as EOL as desired
df.to_csv('df.csv.gz', index=None, line_terminator='\n')
with gzip.open('df.csv.gz', 'rb') as f:
print("df.csv.gz, '\\n' terminator :", f.read())
这显然与CSV in Python adding an extra carriage return, on Windows上先前讨论的问题有关。我的问题是,压缩文件与未压缩文件的行为不同。这是一个已知问题吗?