我有一个数据框如下所示:
Case Peak 'A' Peak 'B' Volume 'C' Volume 'D'
1 5.00 4.00 0.34 0.32
2 5.70 6.00 0.14 0.15
3 11.00 20.00 0.42 0.50
预期输出如下所示:
其中:
' Diff Peak'需要添加百分比差异的列,即([(B-A)/ B] * 100)
' Diff Vol'列将添加([(D-C)/ D] * 100),这是百分比差异。
'在范围内'为峰值'如果是“Diff Peak”,则需要添加。在-15%到25%的范围内,则此列必须填写为是。如果不是,则如图所示。
类似地'在范围内'列已填入'卷'如果' Diff Vol'在-10%至20%的范围内。
我怎么可能这样做?
答案 0 :(得分:1)
只需创建新列:
import numpy as np
df['Diff Peak'] = (df.B - df.A) / df.B * 100
df['Diff Vol'] = (df.D - df.C) / df.D * 100
df['Within Range Peak'] = np.logical_and(df['Diff Peak'] >= -15.0, df['Diff Peak'] <= 25.0)
df['Within Range Vol'] = np.logical_and(df['Diff Vol'] >= -10.0, df['Diff Vol'] <= 20.0)
答案 1 :(得分:1)
如果列中不需要Multiindex
,则可以使用
#use formulas
df['Diff Peak'] = df["Peak 'B'"].sub(df["Peak 'A'"]).div(df["Peak 'B'"]).mul(100)
df['Diff Vol'] = df["Volume 'D'"].sub(df["Volume 'C'"]).div(df["Volume 'D'"]).mul(100)
#check range, then add Yes or No
df['Peak Within Range'] = np.where(df['Diff Peak'].between(-15, 25), 'Yes', 'No')
df['Volumn Within Range'] = np.where(df['Diff Vol'].between(-10, 20), 'Yes', 'No')
#convert to string, rounding (if necessary), add %
df['Diff Peak'] = df['Diff Peak'].round(2).astype(str) + '%'
df['Diff Vol'] = df['Diff Vol'].round(2).astype(str) + '%'
print (df)
Case Peak 'A' Peak 'B' Volume 'C' Volume 'D' Diff Peak Diff Vol \
0 1 5.0 4.0 0.34 0.32 -25.0% -6.25%
1 2 5.7 6.0 0.14 0.15 5.0% 6.67%
2 3 11.0 20.0 0.42 0.50 45.0% 16.0%
Peak Within Range Volumn Within Range
0 No Yes
1 Yes Yes
2 No Yes
但如果列中需要Multiindex
:
df = df.set_index('Case')
df['Peak Diff peak'] = df["Peak 'B'"].sub(df["Peak 'A'"]).div(df["Peak 'B'"]).mul(100)
df['Volume Diff Vol'] = df["Volume 'D'"].sub(df["Volume 'C'"]).div(df["Volume 'D'"]).mul(100)
df['Peak Within Range'] = np.where(df['Peak Diff peak'].between(-15, 25), 'Yes', 'No')
df['Volume Within Range'] = np.where(df['Volume Diff Vol'].between(-10, 20), 'Yes', 'No')
df['Peak Diff peak'] = df['Peak Diff peak'].round(2).astype(str) + '%'
df['Volume Diff Vol'] = df['Volume Diff Vol'].round(2).astype(str) + '%'
#filter columns start with Peak
df1 = df.filter(regex='^Peak')
#rename parts of columns
df1.columns = df1.columns.str.replace('Peak', 'Peak (+25% to -15%)_')
#create MultiIndex
df1.columns = df1.columns.str.split('_ ', expand=True)
print (df1)
Peak (+25% to -15%)
'A' 'B' Diff peak Within Range
Case
1 5.0 4.0 -25.0% No
2 5.7 6.0 5.0% Yes
3 11.0 20.0 45.0% No
#same as df1, only Volume
df2 = df.filter(regex='^Volume')
df2.columns = df2.columns.str.replace('Volume', 'Volume (+20% to -10%)_')
df2.columns = df2.columns.str.split('_ ', expand=True)
print (df2)
Volume (+20% to -10%)
'C' 'D' Diff Vol Within Range
Case
1 0.34 0.32 -6.25% Yes
2 0.14 0.15 6.67% Yes
3 0.42 0.50 16.0% Yes
#concat both dataframes to one
df3 = pd.concat([df1, df2], axis=1).reset_index()
print (df3)
Case Peak (+25% to -15%) Volume (+20% to -10%) \
'A' 'B' Diff peak Within Range 'C'
0 1 5.0 4.0 -25.0% No 0.34
1 2 5.7 6.0 5.0% Yes 0.14
2 3 11.0 20.0 45.0% No 0.42
'D' Diff Vol Within Range
0 0.32 -6.25% Yes
1 0.15 6.67% Yes
2 0.50 16.0% Yes