我在尝试通过pandas运行我的python脚本时出现以下错误,当运行30个记录数据时,请告知出了什么问题
回溯(最近一次调用最后一次):文件“extractyooochoose2.py”,第32行,totalitems = [len(x)for x in clicksdat.groupby('Sid')['itemid']。unique()] 文件“”,第13行,独特 在包装器中输入文件“/home/ubuntu/anaconda2/lib/python2.7/site-packages/pandas/core/groupby.py”,第620行 提出ValueError
数据和代码如下所示
import pandas as pd
import datetime as dt
clickspath='/tmp/gensim/yoochoose/yoochoose-clicks.dat'
buyspath='/tmp/gensim/yoochoose/yoochoose-buys.dat'
clicksdat=pd.read_csv(clickspath,header=None,dtype={'itemid': pd.np.str_,'Sid':pd.np.str_,'Timestamp':pd.np.str_,'itemcategory':pd.np.str_})
clicksdat.columns=['Sid','Timestamp','itemid','itemcategory']
buysdat=pd.read_csv(buyspath,header=None)
buysdat.columns=['Sid','Timestamp','itemid','price','qty']
segment={}
for i in range(24):
if i<7:
segment[i]='EM'
elif i<10:
segment[i]='M'
elif i<13:
segment[i]='A'
elif i<18:
segment[i]='E'
elif i<23:
segment[i]='N'
elif i<25:
segment[i]='MN'
#*******************************************
buyersession=buysdat.Sid.unique()
clickersession=clicksdat.Sid.unique()
maxtemp=[(dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ")) for x in clicksdat.groupby('Sid')['Timestamp'].max()]
mintemp=[dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ") for x in clicksdat.groupby('Sid')['Timestamp'].min()]
duration=[int((a-b).total_seconds()) for a,b in zip(maxtemp,mintemp)]
day=[x.day for x in maxtemp]
month=[x.month for x in maxtemp]
noofnavigations=[clicksdat.groupby('Sid').count().Timestamp][0]
totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
totalcats=[len(x) for x in clicksdat.groupby('Sid')['itemcategory'].unique()]
timesegment= [segment[x.hour]for x in maxtemp]
segmentchange=[1 if (segment[x.hour]!=segment[y.hour]) else 0 for x,y in zip(maxtemp,mintemp)]
purchased=[x in buyersession for x in noofnavigations.index.values ]
percentile_list = pd.DataFrame({'purchased' : purchased,'duration':duration,'day':day,'month':month,'noofnavigations':noofnavigations,'totalitems':totalitems,'totalcats':totalcats,'timesegment':timesegment,'segmentchange':segmentchange })
percentile_list.to_csv('/tmp/gensim/yoochoose/yoochoose-clicks1001.csv')
示例数据如下所示
sessioid,timestamp,itemid,category
1,2014-04-07T10:51:09.277Z,214536502,0
1,2014-04-07T10:54:09.868Z,214536500,0
1,2014-04-07T10:54:46.998Z,214536506,0
1,2014-04-07T10:57:00.306Z,214577561,0
2,2014-04-07T13:56:37.614Z,214662742,0
2,2014-04-07T13:57:19.373Z,214662742,0
2,2014-04-07T13:58:37.446Z,214825110,0
2,2014-04-07T13:59:50.710Z,214757390,0