我的数据文件如下所示:
ABE200501.dat
ABE200502.dat
ABE200503.dat
...
所以我首先将这些文件合并到all.dat
中,并进行一些清理
fout=open("all.dat","w")
for year in range(2000,2017):
for month in range(1,13):
try:
for line in open("ABE"+ str(year) +"%02d"%(month)+".dat"):
fout.write(line.replace("[", " ").replace("]", " ").replace('"', " ").replace('`', " "))
except:
pass
fout.close()
我后来读了pandas中的最终文件
df = pd.read_csv("all.dat", skipinitialspace=True, error_bad_lines=False, sep=' ',
names = ['stationID','time','vis','day_type','vis2','day_type2','dir','speed','dir_max','speed_max','visual_range', 'unknown'])
我想知道,如果可以将组合文件直接保存在RAM而不是我的硬盘中?这可以为我节省很多不必要的空间。
谢谢!
答案 0 :(得分:1)
StringIO
模块允许您将字符串视为文件。
文档示例:
import StringIO
output = StringIO.StringIO()
output.write('First line.\n')
print >>output, 'Second line.'
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
对于您自己的代码:
fout = StringIO.StringIO()
# treat fout as a file handle like usual
# parse input files, writing to fout
file = fout.getvalue() # file is kind of a virtual file now
# and can be "opened" by StringIO
fout.close()
# ...
using StringIO.StringIO(file) as fin:
df = pd.read_csv(fin, skipinitialspace=True, error_bad_lines=False, sep=' ', names = ['stationID','time','vis','day_type','vis2','day_type2','dir','speed','dir_max','speed_max','visual_range', 'unknown'])
pandas接受路径名字符串和文件句柄作为输入。