查找每列的总和并将它们组合以找到前3个最高值

时间:2017-06-25 05:34:48

标签: pandas

a = pd.DataFrame(df.groupby('actor_1_name')['gross'].sum())
b = pd.DataFrame(df.groupby('actor_2_name')['gross'].sum())
c = pd.DataFrame(df.groupby('actor_3_name')['gross'].sum())

x = [a,b,c]
y = pd.concat(x)

p =['actor_1_name','actor_2_name','actor_3_name','gross']
df.loc[y.nlargest(3).index,p]

我想找到每列的总和然后将它们组合在一起以找到前3个最高值,但我收到错误并且不确定如何解决它。我需要一些帮助。 Error Message

enter image description here

1 个答案:

答案 0 :(得分:0)

我相信你需要:

df = pd.DataFrame({'actor_1_name':['a','a','a','b','b','c','c','d','d','e'],
                   'actor_2_name':['d','d','a','c','b','c','c','d','e','e'],
                   'actor_3_name':['c','c','a','b','b','b','c','e','e','e'],
                   'gross':[1,2,3,4,5,6,7,8,9,10]})
print (df)
  actor_1_name actor_2_name actor_3_name  gross
0            a            d            c      1
1            a            d            c      2
2            a            a            a      3
3            b            c            b      4
4            b            b            b      5
5            c            c            b      6
6            c            c            c      7
7            d            d            e      8
8            d            e            e      9
9            e            e            e     10
a = df.groupby('actor_1_name')['gross'].sum().nlargest(3)
b = df.groupby('actor_2_name')['gross'].sum().nlargest(3)
c = df.groupby('actor_3_name')['gross'].sum().nlargest(3)

x = [a,b,c]
print (x)
[actor_1_name
d    17
c    13
e    10
Name: gross, dtype: int64, actor_2_name
e    19
c    17
d    11
Name: gross, dtype: int64, actor_3_name
e    27
b    15
c    10
Name: gross, dtype: int64]
df1 = pd.concat(x, axis=1, keys=['actor_1_name','actor_2_name','actor_3_name'])
print (df1)
   actor_1_name  actor_2_name  actor_3_name
b           NaN           NaN          15.0
c          13.0          17.0          10.0
d          17.0          11.0           NaN
e          10.0          19.0          27.0

EDIT1:

a = df.groupby('actor_1_name')['gross'].sum().nlargest(3).reset_index()
b = df.groupby('actor_2_name')['gross'].sum().nlargest(3).reset_index()
c = df.groupby('actor_3_name')['gross'].sum().nlargest(3).reset_index()

x = [a,b,c]
print (x)
[  actor_1_name  gross
0            d     17
1            c     13
2            e     10,   actor_2_name  gross
0            e     19
1            c     17
2            d     11,   actor_3_name  gross
0            e     27
1            b     15
2            c     10]

df1 = pd.concat(x, axis=1, keys=['a','b','c'])
df1.columns = df1.columns.map('-'.join)
print (df1)
  a-actor_1_name  a-gross b-actor_2_name  b-gross c-actor_3_name  c-gross
0              d       17              e       19              e       27
1              c       13              c       17              b       15
2              e       10              d       11              c       10

EDIT2:

a = df.groupby('actor_1_name')['gross'].sum().nlargest(3).reset_index(drop=True)
b = df.groupby('actor_2_name')['gross'].sum().nlargest(3).reset_index(drop=True)
c = df.groupby('actor_3_name')['gross'].sum().nlargest(3).reset_index(drop=True)

x = [a,b,c]
print (x)
[0    17
1    13
2    10
Name: gross, dtype: int64, 0    19
1    17
2    11
Name: gross, dtype: int64, 0    27
1    15
2    10
Name: gross, dtype: int64]

df1 = pd.concat(x, axis=1, keys=['actor_1_name','actor_2_name','actor_3_name'])
print (df1)
   actor_1_name  actor_2_name  actor_3_name
0            17            19            27
1            13            17            15
2            10            11            10