Question

我有一个用我创建的pandas分组数据框：

Prof_file=prof_claims.groupby(['TC_Code', 'Primary_CPT_Description'])
grp_prof=Prof_file['Total_Case_AMT'].agg([np.sum, np.mean, np.count_nonzero])

现在我想在“Primary_CPT_Description”字段中找到最长的字符串。

我正在使用

grp_prof.ix[grp_prof['Primary_CPT_Description'].idxmax()]

我也试过

print grp_prof.groupby(['Primary_CPT_Description']).idxmax()

但是我一直收到错误：KeyError：你没有名为Primary_CPT_Description的项目'

这似乎没有意义，因为我肯定将'Primary_CPT_Description'作为df中的字符串字段。

Answer 1

你的grp_prof可能看起来像这样（制造的数据，但你明白了）：

>>> grp_prof
                                 sum  mean  count_nonzero
TC_Code Primary_CPT_Description                          
0       5                         15    15              1
1       6                         16    16              1
2       7                         17    17              1
3       8                         18    18              1
4       9                         19    19              1

了解TC_Code和Primary_CPT_Description低于sum，mean和count_nonzero的方式？他们不是专栏，他们是索引的一部分：

>>> grp_prof.columns
Index([u'sum', u'mean', u'count_nonzero'], dtype='object')
>>> grp_prof.index
MultiIndex(levels=[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]],
           labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
           names=[u'TC_Code', u'Primary_CPT_Description'])

我可能会使用.reset_index()：

>>> grp_prof = grp_prof.reset_index()
>>> grp_prof
   TC_Code  Primary_CPT_Description  sum  mean  count_nonzero
0        0                        5   15    15              1
1        1                        6   16    16              1
2        2                        7   17    17              1
3        3                        8   18    18              1
4        4                        9   19    19              1
>>> grp_prof["Primary_CPT_Description"].idxmax()
4

要获得最长的字符串，首先需要获得一系列长度（新假数据）：

>>> df["Primary_CPT_Description"]
0      A
1     BC
2    CDE
3      F
4     GH
Name: Primary_CPT_Description, dtype: object
>>> df["Primary_CPT_Description"].apply(len)
0    1
1    2
2    3
3    1
4    2
Name: Primary_CPT_Description, dtype: int64
>>> df["Primary_CPT_Description"].apply(len).idxmax()
2
>>> df["Primary_CPT_Description"].str.len().idxmax()
2

df中idxmax的关键错误

1 个答案: