Pandas - Rebasing values based on a specific column

时间:2018-03-23 00:25:20

标签: python pandas dataframe

I have the following dataframe:

     Name    2018-02-28    2018-01-31    2018-12-31    2017-11-30    2017-10-31    2017-09-30
ID
11   ABC      110           109           108             100            95                90
22   DEF      120           119           118             100            85                80
33   GHI      130           129           128             100            75                70

I would like to obtain the below table where the resulting data reflects the % chg of the row's values relative to a particular row, in this case 2017-11-30's values.

Then, create a row at the bottom of the dataframe that provides the average.

     Name    2018-02-28    2018-01-31    2018-12-31    2017-11-30    2017-10-31    2017-09-30
ID
11   ABC      10.0%         9.0%         8.0%             0.0%         -5.0%           -10.0%
22   DEF      20.0%         19.0%        18.0%            0.0%         -15.0%          -20.0%
33   GHI      30.0%         29.0%        28.0%            0.0%         -25.0%          -30.0%
    Average   20.0%         19.0%        18.0%            0.0%         -15.0%          -20.0%

My actual dataframe has about 50 columns and 50 rows, and the actual column as the "base" value when we calculate the % chg is 1 year ago (ie column 14). A solution as generic as possible would be really appreciated!

2 个答案:

答案 0 :(得分:2)

You can use numpy for this. Below output is in decimals, you can multiply by 100 if necessary.

df.iloc[:, 1:] = (df.iloc[:, 1:].values / df.iloc[:, 4].values[:, None]) - 1

df.loc[len(df)+1] = ['Average'] + np.mean(df.iloc[:, 1:].values, axis=0).tolist()

Result

       Name  2018-02-28  2018-01-31  2018-12-31  2017-11-30  2017-10-31  \
ID                                                                        
11      ABC         0.1        0.09        0.08         0.0       -0.05   
22      DEF         0.2        0.19        0.18         0.0       -0.15   
33      GHI         0.3        0.29        0.28         0.0       -0.25   
4   Average         0.2        0.19        0.18         0.0       -0.15   

    2017-09-30  
ID              
11        -0.1  
22        -0.2  
33        -0.3  
4         -0.2  

Explanation

  • df.iloc[:, 1:] extracts the 2nd column onwards; .values retrieves the numpy array representation from the dataframe.
  • [:, None] changes the axis of the array so that the division is oriented correctly.

答案 1 :(得分:2)

我无法发布jpps解决方案的延续,但是使用multiindex清理它。首先,我们使用pd.compat重新创建数据集。

import pandas as pd
import numpy as np

data = '''\
ID   Name     2018-02-28    2018-01-31    2018-12-31    2017-11-30    2017-10-31    2017-09-30
11   ABC      110           109           108             100            95                90
22   DEF      120           119           118             100            85                80
33   GHI      130           129           128             100            75                70'''

df = pd.read_csv(pd.compat.StringIO(data), sep='\s+').set_index('ID')

替代单一索引:

# Pop away the column names and add Average
names = df.pop('Name').tolist() + ['Average']

# Recreate dataframe with percent of column index 4
df.loc[:] = (df.values.T - df.iloc[:,3].values).T / 100

# Get the mean and append
s = df.mean()
s.name = '99' # name is required when you use append (this will be the id)
df = df.append(s)

# Insert back
df.insert(0,'Name', names)
print(df)

返回

       Name  2018-02-28  2018-01-31  2018-12-31  2017-11-30  2017-10-31  \
ID                                                                        
11      ABC         0.1        0.09        0.08         0.0       -0.05   
22      DEF         0.2        0.19        0.18         0.0       -0.15   
33      GHI         0.3        0.29        0.28         0.0       -0.25   
99  Average         0.2        0.19        0.18         0.0       -0.15   

    2017-09-30  
ID              
11        -0.1  
22        -0.2  
33        -0.3  
99        -0.2 

替代多索引

# Set dual index
df = df.set_index([df.index,'Name'])

# Recreate dataframe with percent of column index 3 (4th)
df.loc[:] = (df.values.T - df.iloc[:,3].values).T / 100

# Get the mean and append
s = df.mean()
s.name = 'Average'
df = df.append(s)
print(df)

df输出:

           2018-02-28   2018-01-31  2018-12-31  2017-11-30  2017-10-31  2017-09-30
(11, ABC)   0.1 0.09    0.08    0.0 -0.05   -0.1
(22, DEF)   0.2 0.19    0.18    0.0 -0.15   -0.2
(33, GHI)   0.3 0.29    0.28    0.0 -0.25   -0.3
Average     0.2 0.19    0.18    0.0 -0.15   -0.2