有没有更好的方法可以在一列中获得最大的价值?

时间:2019-08-01 17:46:59

标签: pandas pandas-groupby

我正在寻找一种更优雅的解决方案,以获取每个熊猫组的唯一获奖者名单(最高票数)。

我已经下载了California election results并购买了要在名为create_df的函数中使用的数据。

df = create_df()
df.head()
    candidate   county  district    office      party   precinct    votes
0   JOHN COX    ALAMEDA NaN         GOVERNOR    REP     200100      49.0
1   JOHN COX    ALAMEDA NaN         GOVERNOR    REP     200200      55.0
2   JOHN COX    ALAMEDA NaN         GOVERNOR    REP     200300      26.0
3   JOHN COX    ALAMEDA NaN         GOVERNOR    REP     200600      28.0
4   JOHN COX    ALAMEDA NaN         GOVERNOR    REP     200700      35.0

我当前的实现是这样的:

county_votes = df.query("office == 'GOVERNOR'")\
                 .groupby(["county", "party"], as_index=False)\
                 .votes.sum()
winners = county_votes.reindex(
    county_votes.groupby("county").votes.idxmax().values
)[["county", "party"]]

winner.head()
    county      party
0   ALAMEDA     DEM
2   ALPINE      DEM
5   AMADOR      REP
7   BUTTE       REP
9   CALAVERAS   REP

有更好的方法吗?

1 个答案:

答案 0 :(得分:0)

我找到了另一种方法,而且似乎也更快。

/Users/dromero/Documents/annotator-backend/node_modules/express/lib/router/index.js:458
      throw new TypeError('Router.use() requires a middleware function but got a ' + gettype(fn))
      ^

TypeError: Router.use() requires a middleware function but got a undefined
    at Function.use (/Users/dromero/Documents/annotator-backend/node_modules/express/lib/router/index.js:458:13)
    at Function.<anonymous> (/Users/dromero/Documents/annotator-backend/node_modules/express/lib/application.js:220:21)
    at Array.forEach (<anonymous>)
    at Function.use [as _super] (/Users/dromero/Documents/annotator-backend/node_modules/express/lib/application.js:217:7)
    at Function.use (/Users/dromero/Documents/annotator-backend/node_modules/@feathersjs/express/lib/index.js:50:28)
    at Function.newMethod [as use] (/Users/dromero/Documents/annotator-backend/node_modules/@feathersjs/express/node_modules/uberproto/lib/proto.js:34:20)
    at Object.<anonymous> (/Users/dromero/Documents/annotator-backend/src/app.js:11:5)
    at Module._compile (internal/modules/cjs/loader.js:776:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:787:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:829:12)
    at startup (internal/bootstrap/node.js:283:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)
  

每个循环42.4 ms±97 µs(平均±标准偏差,共运行7次,每个循环10个循环)

%%timeit
county_votes = df.query("office == 'GOVERNOR'")\
    .groupby(["county", "party"], as_index=False)\
    .votes.sum()
county_votes.reindex(
    county_votes.groupby("county").votes.idxmax().values
)[["county", "party"]].head()
  

每个循环31.6 ms±60.9 µs(平均±标准偏差,共运行7次,每个循环10个循环)