我正在编写一个代码,用于减少~4000个时间序列文件的维数。通过降维我的意思是我手动提取时间段的平均值/标准时间(一天4个,按季节,按工作日/周末)。
file = pd.read_csv('test/Res/1002', sep='\t', encoding = 'utf-8')
file.drop([col for col in file.columns if "Unnamed" in col], axis=1, inplace=True)
这是一个摘录 - 在一天中的4个时段中,显示过夜时间的代码。数据包含以下列:[daycode,val,day,hm,weekday],并根据' hm'值我提取某个时间段的数据。
fcond = 1 <= file['hm']
scond = file['hm'] <= 14
overnperiod = file[file['hm'].isin([47, 48]) | fcond & scond]
overn_mean = overnperiod['val'].mean()
overn_std = overnperiod['val'].std()
overnperiod['daycode'] = pd.to_datetime(overnperiod['daycode'])
overnperiod['weekday'] = overnperiod['daycode'].dt.dayofweek
fcond = 0<= overnperiod['weekday']
scond = overnperiod['weekday'] <=4
wd_P1 = overnperiod[fcond & scond]
we_P1 = overnperiod[overnperiod['weekday'].isin([5,6])]
以下是不同时间段冗长代码的一部分。
summer_P1 = overnperiod[overnperiod['day'].between(172, 264) | overnperiod['day'].between(537, 629)]
winter_P1 = overnperiod[overnperiod['day'].between(355, 443) | overnperiod['day'].between(720, 730) | overnperiod['day'].between(1, 79)]
以下是从仪表的时间序列数据中提取的特征。 (总共4000 x 7)
overn_PR = overn_mean/head_P
breakf_PR = breakf_mean/head_P
day_PR = day_mean/head_P
evng_PR = evng_mean/head_P
std_year = overn_std/overn_mean + breakf_std/breakf_mean + day_std/day_mean + evng_std/evng_mean
season_score = abs(wd_P1['val'].mean() - we_P1['val'].mean())/overn_mean + abs(wd_P2['val'].mean() - we_P2['val'].mean())/breakf_mean + abs(wd_P3['val'].mean() - we_P3['val'].mean())/day_mean + abs(wd_P4['val'].mean() - we_P4['val'].mean())/evng_mean
week_score = abs(winter_P1['val'].mean() - summer_P1['val'].mean())/overn_mean + abs(winter_P2['val'].mean() - summer_P2['val'].mean())/breakf_mean + abs(winter_P3['val'].mean() - summer_P3['val'].mean())/day_mean + abs(winter_P4['val'].mean() - summer_P4['val'].mean())/evng_mean
现在我的想法是每米创建数据帧,并以某种方式在仪表ID的标题下添加所有数据帧。但是因为我无法在循环中使用append
- &gt;在循环技术之外的concat
,通过在循环内使用concat
来补偿速度的低效率,我不确定该选项是否已离开。
在我看来(直觉......不确定),join
更快,但要创建一个空数据框并加入循环.. if
条件
除此之外,我还想知道merge
,concat
,join
是否存在任何速度差异。
有什么想法吗?