我使用以下功能连接大量CSV文件:
def concatenate():
files = sort() # input is an array of filenames
merged = pd.DataFrame()
for file in files:
print "concatinating" + file
if file.endswith('FulltimeSimpleOpt.csv'): # only consider those filenames
filenamearray = file.split("_")
f = pd.read_csv(file, index_col=0)
f.loc[:,'Vehicle'] = filenamearray[0].replace("veh", "")
f.loc[:,'Year'] = filenamearray[1].replace("year", "")
if "timelimit" in file:
f.loc[:,'Timelimit'] = "1"
else:
f.loc[:,'Timelimit'] = "0"
merged = pd.concat([merged, f], axis=0)
merged.to_csv('merged.csv')
此功能的问题在于它不能很好地处理大量文件(30,000)。我尝试使用100个文件的样本,这些文件正确完成。但是,对于30,000个文件,脚本会在某些时候变慢并崩溃。
如何在Python Pandas中更好地处理大量文件?
答案 0 :(得分:7)
首先列出dfs,然后连接:
var logic = function( currentDateTime ){
var d1 = new Date();
// Check that it's today, so we need to restrict time chooser
if (currentDateTime.getDate() == d1.getDate() && currentDateTime.getMonth() == d1.getMonth())
{
// Adding six hours
d1.setHours ( d1.getHours() + 6 );
// Creating 'HH:MM' string
var defaultTime = (d1.getHours() < 10 ? "0" : "") + d1.getHours() + ":" + (d1.getMinutes() < 10 ? "0" : "") + d1.getMinutes();
// Enforce time restriction
// ('this' is jquery datetimepicker object)
this.setOptions({
minTime : defaultTime,
defaultTime : defaultTime
});
}
else
{
// Lift time restriction if selected day is not today
this.setOptions({
minTime : false,
defaultTime : false
});
}
};
// Initiate datepicker with custom logic
$('#datetimepicker').datetimepicker({
onChangeDateTime:logic,
onShow:logic
});
你正在做的是通过重复连接来逐步增加你的df,制作一个dfs列表然后连续地连接所有这些都是最优的