我正在尝试运行以下代码。它适用于较小的数据量,但对于较大的数据量,几乎要花一天的时间。
任何可以帮助优化代码或可以告诉我该方法的人。我们可以使用apply lambda解决问题吗?
for index in df.index:
for i in df.index:
if ((df.loc[index,"cityId"]==df.loc[i,"cityId"]) & (df.loc[index,"landingPagePath"]==df.loc[i,"landingPagePath"]) &
(df.loc[index,"exitPagePath"]==df.loc[i,"exitPagePath"]) &
(df.loc[index,"campaign"]==df.loc[i,"campaign"]) &
(df.loc[index,"pagePath"]==df.loc[i,"previousPagePath"]) &
((df.loc[index,"dateHourMinute"]+timedelta(minutes=math.floor(df.loc[index,"timeOnPage"]/60))==df.loc[i,"dateHourMinute"]) |
(df.loc[index,"dateHourMinute"]==df.loc[i,"dateHourMinute"]) |
((df.loc[index,"dateHourMinute"]+timedelta(minutes=math.floor(df.loc[index,"timeOnPage"]/60))+timedelta(minutes=1))==df.loc[i,"dateHourMinute"]))
):
if(df.loc[i,"sess"]==0):
df.loc[i,'sess']=df.loc[index,'sess']
elif(df.loc[index,"sess"]>df.loc[i,"sess"] ):
df.loc[index,'sess']=df.loc[i,'sess']
elif(df.loc[index,"sess"]==0):
df.loc[index,'sess']=df.loc[i,'sess']
elif(df.loc[index,"sess"]<df.loc[i,"sess"] ):
x=df.loc[i,"sess"]
for q in df.index:
if(df.loc[q,"sess"]==x):
df.loc[q,"sess"]=df.loc[index,'sess']
else:
if (df.loc[index,"sess"]==0):
df.loc[index,'sess'] = max(df["sess"])+1
答案 0 :(得分:1)
看起来您正在尝试手动进行数据库“连接”,Pandas将此功能公开为merge
,使用它可以大大解决您的问题
我在遍及所有分支机构时遇到了麻烦,但是如果您使用merge
,然后也许要进行一些后期处理/过滤以获得最终答案,那么您应该能够获得大部分途径< / p>