如何有效地连接for循环中的数据帧计算速度btw join,merge,concat

时间:2018-03-09 20:40:52

标签: python dataframe join merge concat

我正在编写一个代码,用于减少~4000个时间序列文件的维数。通过降维我的意思是我手动提取时间段的平均值/标准时间(一天4个,按季节,按工作日/周末)。

file = pd.read_csv('test/Res/1002', sep='\t', encoding = 'utf-8')
file.drop([col for col in file.columns if "Unnamed" in col], axis=1, inplace=True)

这是一个摘录 - 在一天中的4个时段中,​​显示过夜时间的代码。数据包含以下列:[daycode,val,day,hm,weekday],并根据' hm'值我提取某个时间段的数据。

fcond = 1 <= file['hm']
scond = file['hm'] <= 14
overnperiod = file[file['hm'].isin([47, 48]) | fcond & scond]
overn_mean = overnperiod['val'].mean()
overn_std = overnperiod['val'].std()

overnperiod['daycode'] = pd.to_datetime(overnperiod['daycode'])
overnperiod['weekday'] = overnperiod['daycode'].dt.dayofweek

fcond = 0<= overnperiod['weekday']
scond = overnperiod['weekday'] <=4
wd_P1 = overnperiod[fcond & scond]
we_P1 = overnperiod[overnperiod['weekday'].isin([5,6])]

以下是不同时间段冗长代码的一部分。

summer_P1 = overnperiod[overnperiod['day'].between(172, 264) | overnperiod['day'].between(537, 629)]
winter_P1 = overnperiod[overnperiod['day'].between(355, 443) | overnperiod['day'].between(720, 730) | overnperiod['day'].between(1, 79)]

以下是从仪表的时间序列数据中提取的特征。 (总共4000 x 7)

overn_PR = overn_mean/head_P 
breakf_PR = breakf_mean/head_P
day_PR = day_mean/head_P
evng_PR = evng_mean/head_P

std_year = overn_std/overn_mean + breakf_std/breakf_mean + day_std/day_mean + evng_std/evng_mean

season_score = abs(wd_P1['val'].mean() - we_P1['val'].mean())/overn_mean + abs(wd_P2['val'].mean() - we_P2['val'].mean())/breakf_mean + abs(wd_P3['val'].mean() - we_P3['val'].mean())/day_mean + abs(wd_P4['val'].mean() - we_P4['val'].mean())/evng_mean

week_score = abs(winter_P1['val'].mean() - summer_P1['val'].mean())/overn_mean + abs(winter_P2['val'].mean() - summer_P2['val'].mean())/breakf_mean + abs(winter_P3['val'].mean() - summer_P3['val'].mean())/day_mean + abs(winter_P4['val'].mean() - summer_P4['val'].mean())/evng_mean

现在我的想法是每米创建数据帧,并以某种方式在仪表ID的标题下添加所有数据帧。但是因为我无法在循环中使用append - &gt;在循环技术之外的concat,通过在循环内使用concat来补偿速度的低效率,我不确定该选项是否已离开。

在我看来(直觉......不确定),join更快,但要创建一个空数据框并加入循环.. if条件

除此之外,我还想知道mergeconcatjoin是否存在任何速度差异。 有什么想法吗?

0 个答案:

没有答案