我在Python 2.7.10中定义了导入五个(每个有16个变量,大小约为120MB)csv.files的函数,它可以工作然后我选择了四个时间变量来格式化为日期时间,前三个变量成功转换但最后一个因内存错误而失败。我定义的函数显示为:
def reddat(filename,year1,year2):
bigdata=defaultdict(list)
for i in range(year1,year2):
string=filename+str(i)+".csv"
with open(string,'rb') as f:
reader=csv.reader(f)
headers=reader.next()
data1 = {h:[] for h in headers}
for row in reader:
for h, v in zip(headers, row):
data1[h].append(v)
for h in headers:
bigdata[h].append(data1[h])
return bigdata
dataall=reddat("Calls_for_Service_",2011,2016)
##This function works to import five years data and combined as one dictionary as dataall##
然后我从dataall中选择了四个变量,
TimeCreate=[]
TimeDispatch=[]
TimeArrive=[]
TimeClosed=[]
for i in range(0,len(dataall['TimeCreate'])):
TimeCreate+=dataall['TimeCreate'][i]
TimeDispatch+=dataall['TimeDispatch'][i]
TimeArrive+=dataall['TimeArrive'][i]
TimeClosed+=dataall['TimeClosed'][i]
现在,从dataall中选择了四个变量作为列表,这四个列表包含字符串,我想将它们更改为日期时间格式。我定义了另一个函数如下:
def func(x):
try:
return dt.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p")
except:
return pd.NaT
我将四个字符串列表更改为日期时间列表:
TimeCreatenew=[func(d) for d in TimeCreate]
TimeDispatchnew=[func(d) for d in TimeDispatch]
TimeArrivenew=[func(d) for d in TimeArrive]
TimeClosednew=[func(d) for d in TimeClosed]
然而," TimeCreatnew"," TimeDispatchnew"," TimeArrivenew"效果很好,但是当" TimeClosednew"更改格式,Python说
Traceback (most recent call last):
File "C:\Users\....\DataScience\scriptnew.py" line 65, in <module>
TimeClosednew=[func(d) for d in TimeClosed]
MemoryError
我的python 2.7.10是32位,我怎么能解决这个问题?或者,如果我的功能&#34; reddat&#34;效果不好?非常感谢
我使用了Anaconda3(64位)的Python 3.5,它解决了没有内存错误的问题。我认为Python 2.7.10可能无法处理如此大的数据。如果有人对这个问题有所了解,可以在Python 2.7.10下解决。请分享想法。非常感谢