运行代码时,我遇到以下错误。 错误-列标签“ Avg_Threat_Score”不是唯一的。
我正在创建数据透视表,并希望将值从高到低排序。
pt = df.pivot_table(index = 'User Name',values = ['Threat Score', 'Score'],
aggfunc = {
'Threat Score': np.mean,
'Score' :[np.mean, lambda x: len(x.dropna())]
},
margins = False)
new_col =['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
pt.columns = [new_col]
#befor this code is working, after that now working
df = df.reindex(pt.sort_values
(by = 'Avg_Threat_Score',ascending=False).index)
需要对“ Avg_Threat_Score”列的值从高到低排序
答案 0 :(得分:2)
您需要按列表而不是嵌套列表传递新的列名称,因为大熊猫会在一个级别上创建MultiIndex
。
new_col =['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
pt.columns = [new_col]
就像:
pt.columns = [['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']]
ValueError:列标签“ Avg_Threat_Score”不是唯一的。
对于多索引,标签必须是一个元组,其元素与每个级别相对应。
因此使用:
pt.columns = ['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
示例:
df = pd.DataFrame({
'User Name':list('ababaa'),
'Threat Score':[4,5,4,np.nan,5,4],
'Score':[np.nan,8,9,4,2,np.nan],
'D':[1,3,5,7,1,0]})
pt = (df.pivot_table(index = 'User Name',values = ['Threat Score', 'Score'],
aggfunc = {
'Threat Score': np.mean,
'Score' :[np.mean, lambda x: len(x.dropna())]
},
margins = False))
pt.columns = ['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
print (pt)
User Name Count AVG_TH_Score Avg_Threat_Score
User Name
a 2.0 5.5 4.25
b 2.0 6.0 5.00
然后按照从Avg_Threat_Score
开始的顺序进行排序,请对列User Name
使用有序的Categorical
,这样最后sort_values
就可以工作了:
names = pt.sort_values(by = 'Avg_Threat_Score',ascending=False).index
print (names)
#Index(['b', 'a'], dtype='object', name='User Name')
df['User Name'] = pd.CategoricalIndex(df['User Name'], categories=names, ordered=True)
df = df.sort_values('User Name')
print (df)
User Name Threat Score Score D
1 b 5.0 8.0 3
3 b NaN 4.0 7
0 a 4.0 NaN 1
2 a 4.0 9.0 5
4 a 5.0 2.0 1
5 a 4.0 NaN 0
答案 1 :(得分:0)
pt = df.pivot_table(index = 'User Name', values = ['Threat Score', 'Score','Source IP'] ,
aggfunc = {"Source IP" : 'count',
'Threat Score':np.mean,
'Score': np.mean})
pt = pt.sort_values('Threat Score', ascending = False)
new_cols = ['Avg_Score', 'Count', 'Avg_ThreatScore']
pt.columns = new_cols