我有一个多索引数据框,如下所示:
ACA FP Equity UCG IM Equity
LAST PRICE VOLUME LAST PRICE VOLUME
date
2010-01-04 12.825 5879617.0 15.0292 10844639.0
2010-01-05 13.020 6928587.0 14.8092 16456228.0
2010-01-06 13.250 5290631.0 14.6834 10446450.0
2010-01-07 13.255 5328586.0 15.0292 31900341.0
2010-01-08 13.470 7160295.0 15.1707 40750768.0
如果我想在每个资产的数据框中添加第三列,那么语法是什么?例如:
df['ACA FP Equity']['PriceVolume'] = df['ACA FP Equity']['LAST PRICE']*3
但是我想为每个股权做,而不是手动添加每个股权。
提前致谢。
答案 0 :(得分:1)
如果您需要所有LAST PRICE
列乘以3
,请使用slicers选择它们并重命名列名:
idx = pd.IndexSlice
df1 = df.loc[:, idx[:, 'LAST PRICE']].rename(columns={'LAST PRICE':'PriceVolume'}) * 3
print (df1)
ACA FP Equity UCG IM Equity
PriceVolume PriceVolume
2010-01-04 38.475 45.0876
2010-01-05 39.060 44.4276
2010-01-06 39.750 44.0502
2010-01-07 39.765 45.0876
2010-01-08 40.410 45.5121
然后你需要concat
输出:
print (pd.concat([df,df1], axis=1))
ACA FP Equity UCG IM Equity ACA FP Equity \
LAST PRICE VOLUME LAST PRICE VOLUME PriceVolume
2010-01-04 12.825 5879617.0 15.0292 10844639.0 38.475
2010-01-05 13.020 6928587.0 14.8092 16456228.0 39.060
2010-01-06 13.250 5290631.0 14.6834 10446450.0 39.750
2010-01-07 13.255 5328586.0 15.0292 31900341.0 39.765
2010-01-08 13.470 7160295.0 15.1707 40750768.0 40.410
UCG IM Equity
PriceVolume
2010-01-04 45.0876
2010-01-05 44.4276
2010-01-06 44.0502
2010-01-07 45.0876
2010-01-08 45.5121
没有concat
的另一个解决方案是从selected_df
列创建元组然后分配输出:
idx = pd.IndexSlice
selected_df = df.loc[:, idx[:, 'LAST PRICE']]
new_cols = [(x, 'PriceVolume') for x in selected_df.columns.levels[0]]
print (new_cols)
[('ACA FP Equity', 'PriceVolume'), ('UCG IM Equity', 'PriceVolume')]
df[new_cols] = selected_df * 3
print(df)
ACA FP Equity UCG IM Equity ACA FP Equity \
LAST PRICE VOLUME LAST PRICE VOLUME PriceVolume
2010-01-04 12.825 5879617.0 15.0292 10844639.0 38.475
2010-01-05 13.020 6928587.0 14.8092 16456228.0 39.060
2010-01-06 13.250 5290631.0 14.6834 10446450.0 39.750
2010-01-07 13.255 5328586.0 15.0292 31900341.0 39.765
2010-01-08 13.470 7160295.0 15.1707 40750768.0 40.410
UCG IM Equity
PriceVolume
2010-01-04 45.0876
2010-01-05 44.4276
2010-01-06 44.0502
2010-01-07 45.0876
2010-01-08 45.5121
答案 1 :(得分:1)
我能想到的最优雅的方式是:
import re
rx = r"(?:AppearanceTime\s+|^\d+\s+)(\d{2}:\d{2}:\d{2}|\d{2}\.\d{3})"
s = <<YOUR STRING HERE>>
res = re.findall(rx, s, flags=re.MULTILINE)
print(res)
apply语句允许您为数据帧中指定列的每个值执行给定函数,在本例中为lambda expression,将每个输入乘以3。运行apply语句将返回一个pandas Series,然后可以将其添加为数据框中的列。
这是一个简单的示例,展示了如何使用简单的数据框:
df['ACA FP Equity']['PriceVolume'] = pd.Series(df['ACA FP Equity']['LAST PRICE'].apply(lambda x: x*3))