找到了一个使用Python 3.5

Question

我在Python 2.7.10中定义了导入五个（每个有16个变量，大小约为120MB）csv.files的函数，它可以工作然后我选择了四个时间变量来格式化为日期时间，前三个变量成功转换但最后一个因内存错误而失败。我定义的函数显示为：

def reddat(filename,year1,year2):
    bigdata=defaultdict(list)
    for i in range(year1,year2):
        string=filename+str(i)+".csv"     
        with open(string,'rb') as f:
            reader=csv.reader(f)
            headers=reader.next()
            data1 = {h:[] for h in headers}
            for row in reader:
                for h, v in zip(headers, row):
                    data1[h].append(v)
        for h in headers:
            bigdata[h].append(data1[h])
    return bigdata

dataall=reddat("Calls_for_Service_",2011,2016)
##This function works to import five years data and combined as one dictionary as dataall##

然后我从dataall中选择了四个变量，

TimeCreate=[]
TimeDispatch=[]
TimeArrive=[]
TimeClosed=[]
for i in range(0,len(dataall['TimeCreate'])):
    TimeCreate+=dataall['TimeCreate'][i]
    TimeDispatch+=dataall['TimeDispatch'][i]
    TimeArrive+=dataall['TimeArrive'][i]
    TimeClosed+=dataall['TimeClosed'][i]

现在，从dataall中选择了四个变量作为列表，这四个列表包含字符串，我想将它们更改为日期时间格式。我定义了另一个函数如下：

def func(x):
    try:
        return dt.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p")
    except:
        return pd.NaT

我将四个字符串列表更改为日期时间列表：

TimeCreatenew=[func(d) for d in TimeCreate]
TimeDispatchnew=[func(d) for d in TimeDispatch]
TimeArrivenew=[func(d) for d in TimeArrive]
TimeClosednew=[func(d) for d in TimeClosed]

然而，＆＃34; TimeCreatnew＆＃34;，＆＃34; TimeDispatchnew＆＃34;，＆＃34; TimeArrivenew＆＃34;效果很好，但是当＆＃34; TimeClosednew＆＃34;更改格式，Python说

Traceback (most recent call last):
File "C:\Users\....\DataScience\scriptnew.py" line 65, in <module>  
TimeClosednew=[func(d) for d in TimeClosed]
MemoryError

我的python 2.7.10是32位，我怎么能解决这个问题？或者，如果我的功能＆＃34; reddat＆＃34;效果不好？非常感谢

找到了一个使用Python 3.5

的解决方案

我使用了Anaconda3（64位）的Python 3.5，它解决了没有内存错误的问题。我认为Python 2.7.10可能无法处理如此大的数据。如果有人对这个问题有所了解，可以在Python 2.7.10下解决。请分享想法。非常感谢

Python 2.7.10内存错误

找到了一个使用Python 3.5

0 个答案: