我正在导入具有熊猫read_csv
函数的100Mb CSV文件(950630行)。我意识到,如果将header设置为None,它将快3倍。知道为什么吗?
import pandas as pd
import time
# job 1
start=time.time()
df=pd.read_csv("data.txt",sep=',', engine='c', header=None, na_filter=False, low_memory=False)
df.columns = df.iloc[0]
df=df.drop(df.index[0])
print("Job 1 took:",time.time()-start,"sec")
print(df.index.argmax())
# job 2
start2=time.time()
df2=pd.read_csv("data.txt",sep=',', engine='c', header=0,na_filter=False, low_memory=False)
print("Job 2 took:",time.time()-start2,"sec")
print(df2.index.argmax())
作业1:1.7068684101104736秒
工作2花费了:5.8732721090093994秒