我正在尝试使用apply评估Herfindahl指数。我通过将数据帧转换为numpy矩阵来完成此操作。实际上,函数evalHerfindahlIndex运行良好,并且它为每行评估Herfindahl索引的正确值。但是,当我尝试使用相同的函数(evalHerfindahlIndexForDF)来应用apply时,会出现一个非常奇怪的错误:
ValueError: ("No axis named 1 for object type <class 'pandas.core.series.Series'>", 'occurred at index A')
整个代码是这样的:
import pandas as pd
import numpy as np
import datetime
def evalHerfindahlIndex(x):
soma=np.sum(x,axis=1)
y=np.empty(np.shape(x))
for line in range(len(soma)):
y[line,:]=np.power(x[line,:]/soma[line],2.0)
hhi=np.sum(y,axis=1)
return hhi
def evalHerfindahlIndexForDF(x):
soma=x.sum(axis=1)
def creatingDataFrame():
dateList=[]
dateList.append(datetime.date(2002,1,1))
dateList.append(datetime.date(2002,2,1))
dateList.append(datetime.date(2002,1,1))
dateList.append(datetime.date(2002,1,1))
dateList.append(datetime.date(2002,2,1))
raw_data = {'Date': dateList,
'Company': ['A', 'B', 'B', 'C' , 'C'],
'var1': [10, 20, 30, 40 , 50]}
df = pd.DataFrame(raw_data, columns = ['Date','Company', 'var1'])
df.loc[1, 'var1'] = np.nan
return df
if __name__=="__main__":
df=creatingDataFrame()
print(df)
dfPivot=df.pivot(index='Date', columns='Company', values='var1')
#print(dfPivot)
dfPivot=dfPivot.fillna(0)
dfPivot['Date']=dfPivot.index
listOfCompanies=list(set(df['Company']))
Pivot=dfPivot.as_matrix(columns=listOfCompanies)
print(evalHerfindahlIndex(Pivot))
print(dfPivot)
print(dfPivot[listOfCompanies].apply(evalHerfindahlIndexForDF))
我正在使用的数据框是dfPivot:
Company A B C Date
Date
2002-01-01 10.0 30.0 40.0 2002-01-01
2002-02-01 0.0 0.0 50.0 2002-02-01
使用evalHerfindahlIndex评估的赫芬达尔指数的正确值为:
[0.40625 1. ]
我想将其作为数据帧dfPivot的额外列返回。
答案 0 :(得分:1)
考虑更新您的方法,然后通过专门转换数组的方式返回到pandas Series
def evalHerfindahlIndex(df):
x = df.as_matrix(columns = listOfCompanies) # MOVE MATRIX OPERATION WITHIN FCT
soma = np.sum(x,axis = 1)
y = np.empty(np.shape(x))
for line in range(len(soma)):
y[line,:] = np.power(x[line,:]/soma[line],2.0)
hhi = pd.Series(np.sum(y,axis = 1)) # CONVERT TO SERIES
return hhi
...
if __name__=="__main__":
df = creatingDataFrame()
print(df)
dfPivot = df.pivot(index = 'Date', columns = 'Company', values = 'var1')
#print(dfPivot)
dfPivot = dfPivot.fillna(0)
dfPivot['Date'] = dfPivot.index
# ASSIGN SERIES VALUES (.values to IGNORE INDEX)
dfPivot['HE_Result'] = evalHerfindahlIndex(dfPivot).values
# OUTPUT
print(evalHerfindahlIndex(dfPivot))
# 0 0.40625
# 1 1.00000
# dtype: float64
print(dfPivot)
# Company A B C Date HE_Result
# Date
# 2002-01-01 10.0 30.0 40.0 2002-01-01 0.40625
# 2002-02-01 0.0 0.0 50.0 2002-02-01 1.00000