I have the following dataframe:
Name 2018-02-28 2018-01-31 2018-12-31 2017-11-30 2017-10-31 2017-09-30
ID
11 ABC 110 109 108 100 95 90
22 DEF 120 119 118 100 85 80
33 GHI 130 129 128 100 75 70
I would like to obtain the below table where the resulting data reflects the % chg of the row's values relative to a particular row, in this case 2017-11-30's values.
Then, create a row at the bottom of the dataframe that provides the average.
Name 2018-02-28 2018-01-31 2018-12-31 2017-11-30 2017-10-31 2017-09-30
ID
11 ABC 10.0% 9.0% 8.0% 0.0% -5.0% -10.0%
22 DEF 20.0% 19.0% 18.0% 0.0% -15.0% -20.0%
33 GHI 30.0% 29.0% 28.0% 0.0% -25.0% -30.0%
Average 20.0% 19.0% 18.0% 0.0% -15.0% -20.0%
My actual dataframe has about 50 columns and 50 rows, and the actual column as the "base" value when we calculate the % chg is 1 year ago (ie column 14). A solution as generic as possible would be really appreciated!
答案 0 :(得分:2)
You can use numpy
for this. Below output is in decimals, you can multiply by 100 if necessary.
df.iloc[:, 1:] = (df.iloc[:, 1:].values / df.iloc[:, 4].values[:, None]) - 1
df.loc[len(df)+1] = ['Average'] + np.mean(df.iloc[:, 1:].values, axis=0).tolist()
Result
Name 2018-02-28 2018-01-31 2018-12-31 2017-11-30 2017-10-31 \
ID
11 ABC 0.1 0.09 0.08 0.0 -0.05
22 DEF 0.2 0.19 0.18 0.0 -0.15
33 GHI 0.3 0.29 0.28 0.0 -0.25
4 Average 0.2 0.19 0.18 0.0 -0.15
2017-09-30
ID
11 -0.1
22 -0.2
33 -0.3
4 -0.2
Explanation
df.iloc[:, 1:]
extracts the 2nd column onwards; .values
retrieves the numpy
array representation from the dataframe.[:, None]
changes the axis of the array so that the division is oriented correctly.答案 1 :(得分:2)
我无法发布jpps解决方案的延续,但是使用multiindex清理它。首先,我们使用pd.compat重新创建数据集。
import pandas as pd
import numpy as np
data = '''\
ID Name 2018-02-28 2018-01-31 2018-12-31 2017-11-30 2017-10-31 2017-09-30
11 ABC 110 109 108 100 95 90
22 DEF 120 119 118 100 85 80
33 GHI 130 129 128 100 75 70'''
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+').set_index('ID')
替代单一索引:
# Pop away the column names and add Average
names = df.pop('Name').tolist() + ['Average']
# Recreate dataframe with percent of column index 4
df.loc[:] = (df.values.T - df.iloc[:,3].values).T / 100
# Get the mean and append
s = df.mean()
s.name = '99' # name is required when you use append (this will be the id)
df = df.append(s)
# Insert back
df.insert(0,'Name', names)
print(df)
返回
Name 2018-02-28 2018-01-31 2018-12-31 2017-11-30 2017-10-31 \
ID
11 ABC 0.1 0.09 0.08 0.0 -0.05
22 DEF 0.2 0.19 0.18 0.0 -0.15
33 GHI 0.3 0.29 0.28 0.0 -0.25
99 Average 0.2 0.19 0.18 0.0 -0.15
2017-09-30
ID
11 -0.1
22 -0.2
33 -0.3
99 -0.2
替代多索引
# Set dual index
df = df.set_index([df.index,'Name'])
# Recreate dataframe with percent of column index 3 (4th)
df.loc[:] = (df.values.T - df.iloc[:,3].values).T / 100
# Get the mean and append
s = df.mean()
s.name = 'Average'
df = df.append(s)
print(df)
df输出:
2018-02-28 2018-01-31 2018-12-31 2017-11-30 2017-10-31 2017-09-30
(11, ABC) 0.1 0.09 0.08 0.0 -0.05 -0.1
(22, DEF) 0.2 0.19 0.18 0.0 -0.15 -0.2
(33, GHI) 0.3 0.29 0.28 0.0 -0.25 -0.3
Average 0.2 0.19 0.18 0.0 -0.15 -0.2