df_pairs:
city1 city2
0 sfo yyz
1 sfo yvr
2 sfo dfw
3 sfo ewr
输出df_pairs.to_dict('records'):
[{'city1': 'sfo', 'city2': 'yyz'},
{'city1': 'sfo', 'city2': 'yvr'},
{'city1': 'sfo', 'city2': 'dfw'},
{'city1': 'sfo', 'city2': 'ewr'}]
data_df:
city 2016-02-02 00:00:00 2016-02-05 00:00:00 2016-02-01 00:00:00 2016-02-04 00:00:00 2016-02-03 00:00:00
0 sfo -33.63 -62.34 -35.70 -31.84 -33.87
1 yyz -24.31 -51.17 -22.07 -31.00 -23.00
2 yvr -24.31 -51.17 -22.07 -31.00 -23.00
3 dfw -32.17 -43.77 -34.84 0.27 -11.49
4 ewr -28.87 -59.66 -28.40 -32.94 -29.06
输出data_df.to_dict('记录')
[{'city': 'sfo',
Timestamp('2016-02-02 00:00:00'): -33.63,
Timestamp('2016-02-05 00:00:00'): -62.34,
Timestamp('2016-02-01 00:00:00'): -35.7,
Timestamp('2016-02-04 00:00:00'): -31.84,
Timestamp('2016-02-03 00:00:00'): -33.87},
{'city': 'yyz',
Timestamp('2016-02-02 00:00:00'): -24.31,
Timestamp('2016-02-05 00:00:00'): -51.17,
Timestamp('2016-02-01 00:00:00'): -22.07,
Timestamp('2016-02-04 00:00:00'): -31.0,
Timestamp('2016-02-03 00:00:00'): -23.0},
{'city': 'yvr',
Timestamp('2016-02-02 00:00:00'): -24.31,
Timestamp('2016-02-05 00:00:00'): -51.17,
Timestamp('2016-02-01 00:00:00'): -22.07,
Timestamp('2016-02-04 00:00:00'): -31.0,
Timestamp('2016-02-03 00:00:00'): -23.0},
{'city': 'dfw',
Timestamp('2016-02-02 00:00:00'): -32.17,
Timestamp('2016-02-05 00:00:00'): -43.77,
Timestamp('2016-02-01 00:00:00'): -34.84,
Timestamp('2016-02-04 00:00:00'): 0.27,
Timestamp('2016-02-03 00:00:00'): -11.49},
{'city': 'ewr',
Timestamp('2016-02-02 00:00:00'): -28.87,
Timestamp('2016-02-05 00:00:00'): -59.66,
Timestamp('2016-02-01 00:00:00'): -28.4,
Timestamp('2016-02-04 00:00:00'): -32.94,
Timestamp('2016-02-03 00:00:00'): -29.06}]
所以我有一个名为df_pairs
的df。对于df_pairs
中的每一对,我想在data_df
中查找city1和city2,从另一个中减去1,取差异时间序列的符号,分别为正负号,分别为正负差异值并计算每个data_df列的总和。
diff_df_sign_pos = diff_df_sign_neg = diff_df_pos = diff_df_neg = 0
for i in range(0,len(data_df.columns)):
a = pd.merge(df_pairs[['city1','city2']], data_df.ix[:, [i]], left_on='city1', right_index=True, how='left').set_index(['city1', 'city2'])
b = pd.merge(df_pairs[['city1','city2']], data_df.ix[:, [i]], left_on='city2', right_index=True, how='left').set_index(['city1', 'city2'])
diff_df = b - a
diff_df_sign = np.sign(diff_df)
diff_df_sign_pos+= diff_df_sign.clip(lower=0)
diff_df_sign_neg+= diff_df_sign.clip(upper=0)
diff_df_pos+= diff_df.clip(lower=0)
diff_df_neg+= diff_df.clip(upper=0)
如果您运行上述代码,您会发现diff_df_sign_pos
,diff_df_sign_neg
,diff_df_pos
和diff_df_neg
的最终值是NaN。
例如,diff_df_sign_pos
的最终结果应如下所示:
2016-02-03 00:00:00
city1 city2
sfo yyz 5.0
yvr 5.0
dfw 5.0
ewr 4.0
这告诉我们yyz,yvr,dfw和sfo之间的所有5个差异都是正面的。
答案 0 :(得分:1)
你为什么不这样做:
df_city1 = pd.merge(df_pairs['city1'], data_df, left_on='city1', right_on='city', how='left')
df_city2 = pd.merge(df_pairs['city2'], data_df, left_on='city2', right_on='city', how='left')
diff = df_city2.subtract(df_city1, fill_value=0)
pos_sum = diff[diff >= 0].sum(axis=1)
neg_sum = diff[diff < 0].sum(axis=1)
而不是循环遍历所有列,合并2 *(列数)次,更不用说索引了,那么np.sign
和.clip
那个复杂的位......你的{{1}和df_pairs
有一对一的对应关系,对吧?
答案 1 :(得分:0)
给它一个运行:
取出初始变量,摆脱for循环。
a = pd.merge(df_pairs, data_df, left_on='city1', right_on='city', how='left').set_index(['city1', 'city2'])
b = pd.merge(df_pairs, data_df, left_on='city2', right_on='city', how='left').set_index(['city1', 'city2'])
del a['city']
del b['city']
现在进行一次计算,并在每一行(轴= 1)之间求和
diff_df = b - a
diff_df_sign = np.sign(diff_df)
diff_df_sign_pos = diff_df_sign.clip(lower=0).sum(axis=1)
diff_df_sign_neg = diff_df_sign.clip(upper=0).sum(axis=1)
diff_df_pos = diff_df.clip(lower=0).sum(axis=1)
diff_df_neg = diff_df.clip(upper=0).sum(axis=1)
这看起来像你想要的输出吗?
city1 city2
sfo yyz 5
yvr 5
dfw 5
ewr 4
dtype: float64
city1 city2
sfo yyz 0
yvr 0
dfw 0
ewr -1
dtype: float64
city1 city2
sfo yyz 45.83
yvr 45.83
dfw 75.38
ewr 19.55
dtype: float64
city1 city2
sfo yyz 0.0
yvr 0.0
dfw 0.0
ewr -1.1
dtype: float64