如何有条件地拉出数据框元素并将它们放入新列中

时间:2017-06-18 19:25:20

标签: python pandas dataframe

初学者熊猫问题:

如何创建EPS的新列和REV的另一列?

我有这个:

import numpy as np
import pandas as pd
raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
    'Quarter': [4, 4, 1, 1, 2, 2],       
    'Sector': [ 'Gas', 'Future', 'Future', 'Gas', 'Beer', 'Future'],
    'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
    'Metric': ['EPS', 'REV', 'EPS', 'REV', 'EPS', 'REV'],
    'Mean': [1.4, 350, 0.2, 500, 0.9, 120],
    } |
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Ticker', 'Metric','Mean'])
print(df)

我正在寻找像这样的DF:

   Year  Quarter  Ticker    EPS   REV
0  2009      4     NVID    1.4   350
1  2010      1     ATVI    0.2   500
2  2010      2     ATVI    0.9   120

我尝试将EPS和REV分成他们自己的数据框,然后将它们合并/加入,但是遇到了问题:

REV_df = df.where(df['metric']=='revenue', axis=0)
REV_df['Year']=df['Year']
REV_df['Quarter']=df['Quarter']
EPS_df = df.where(df['metric']=='EPS', axis=0)
EPS_df['Year']=df['Year']
EPS_df['Quarter']=df['Quarter']
result = pd.merge(EPS_df, UAA_REV_df, left_on='Year', right_index=True, how='left')

1 个答案:

答案 0 :(得分:0)

您需要set_index + unstack

df1 = df.set_index(['Year','Quarter','Ticker','Metric'])['Mean'].unstack().reset_index()
print (df1)
Metric  Year  Quarter Ticker  EPS    REV
0       2009        4   NVID  1.4  350.0
1       2010        1   ATVI  0.2  500.0
2       2010        2   ATVI  0.9  120.0

但如果得到:

  

ValueError:索引包含重复的条目,无法重塑

然后需要groupby +汇总功能+ unstackpivot_table

raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
    'Quarter': [4, 4, 1, 1, 2, 2],       
    'Sector': [ 'Gas', 'Gas', 'Future', 'Gas', 'Beer', 'Future'],
    'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
    'Metric': ['EPS', 'EPS', 'EPS', 'REV', 'EPS', 'REV'],
    'Mean': [1.4, 2, 0.2, 500, 0.9, 120],
    } 
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Ticker', 'Metric','Mean'])
print(df)
   Year  Quarter Ticker Metric   Mean
0  2009        4   NVID    EPS    1.4 <-duplicate index and columns values
1  2009        4   NVID    EPS    2.0 <-duplicate index and columns values
2  2010        1   ATVI    EPS    0.2
3  2010        1   ATVI    REV  500.0
4  2010        2   ATVI    EPS    0.9
5  2010        2   ATVI    REV  120.0

df1 = df.groupby(['Year','Quarter','Ticker','Metric'])['Mean'].mean().unstack().reset_index()
print (df1)
Metric  Year  Quarter Ticker  EPS    REV
0       2009        4   NVID  1.7    NaN
1       2010        1   ATVI  0.2  500.0
2       2010        2   ATVI  0.9  120.0

或者:

df1 = df.pivot_table(index=['Year','Quarter','Ticker'], 
                     columns='Metric', 
                     values='Mean', 
                     aggfunc='mean')
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
   Year  Quarter Ticker  EPS    REV
0  2009        4   NVID  1.7    NaN
1  2010        1   ATVI  0.2  500.0
2  2010        2   ATVI  0.9  120.0