初学者熊猫问题:
我有这个:
import numpy as np
import pandas as pd
raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
'Quarter': [4, 4, 1, 1, 2, 2],
'Sector': [ 'Gas', 'Future', 'Future', 'Gas', 'Beer', 'Future'],
'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
'Metric': ['EPS', 'REV', 'EPS', 'REV', 'EPS', 'REV'],
'Mean': [1.4, 350, 0.2, 500, 0.9, 120],
} |
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Ticker', 'Metric','Mean'])
print(df)
我正在寻找像这样的DF:
Year Quarter Ticker EPS REV
0 2009 4 NVID 1.4 350
1 2010 1 ATVI 0.2 500
2 2010 2 ATVI 0.9 120
我尝试将EPS和REV分成他们自己的数据框,然后将它们合并/加入,但是遇到了问题:
REV_df = df.where(df['metric']=='revenue', axis=0)
REV_df['Year']=df['Year']
REV_df['Quarter']=df['Quarter']
EPS_df = df.where(df['metric']=='EPS', axis=0)
EPS_df['Year']=df['Year']
EPS_df['Quarter']=df['Quarter']
result = pd.merge(EPS_df, UAA_REV_df, left_on='Year', right_index=True, how='left')
答案 0 :(得分:0)
df1 = df.set_index(['Year','Quarter','Ticker','Metric'])['Mean'].unstack().reset_index()
print (df1)
Metric Year Quarter Ticker EPS REV
0 2009 4 NVID 1.4 350.0
1 2010 1 ATVI 0.2 500.0
2 2010 2 ATVI 0.9 120.0
但如果得到:
ValueError:索引包含重复的条目,无法重塑
然后需要groupby
+汇总功能+ unstack
或pivot_table
:
raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
'Quarter': [4, 4, 1, 1, 2, 2],
'Sector': [ 'Gas', 'Gas', 'Future', 'Gas', 'Beer', 'Future'],
'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
'Metric': ['EPS', 'EPS', 'EPS', 'REV', 'EPS', 'REV'],
'Mean': [1.4, 2, 0.2, 500, 0.9, 120],
}
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Ticker', 'Metric','Mean'])
print(df)
Year Quarter Ticker Metric Mean
0 2009 4 NVID EPS 1.4 <-duplicate index and columns values
1 2009 4 NVID EPS 2.0 <-duplicate index and columns values
2 2010 1 ATVI EPS 0.2
3 2010 1 ATVI REV 500.0
4 2010 2 ATVI EPS 0.9
5 2010 2 ATVI REV 120.0
df1 = df.groupby(['Year','Quarter','Ticker','Metric'])['Mean'].mean().unstack().reset_index()
print (df1)
Metric Year Quarter Ticker EPS REV
0 2009 4 NVID 1.7 NaN
1 2010 1 ATVI 0.2 500.0
2 2010 2 ATVI 0.9 120.0
或者:
df1 = df.pivot_table(index=['Year','Quarter','Ticker'],
columns='Metric',
values='Mean',
aggfunc='mean')
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
Year Quarter Ticker EPS REV
0 2009 4 NVID 1.7 NaN
1 2010 1 ATVI 0.2 500.0
2 2010 2 ATVI 0.9 120.0