pandas groupby适用于系列,但不适用于选择整个数据帧

时间:2015-01-29 09:13:36

标签: python pandas

想了解这种行为。

我有一个Dataframe holdings,它有各种各样的列,例如

[u'date', u'portfolio', u'sector', u'industry', u'instrument', u'name', u'position', u'price', u'pct_chg', u'mv']

其中mv是市场价值。

当我这样做时

holdings['wt'] = holdings.groupby(['holdings.portfolio','holdings.date']).apply(lambda x: x['mv']/sum(x['mv']) )

我收到错误

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in reindexer(value)
   2234 
   2235                     # other
-> 2236                     raise TypeError('incompatible index of inserted column '
   2237                                     'with frame index')
   2238             return value

TypeError: incompatible index of inserted column with frame index

但是当我做的时候

holdings['wt'] = holdings['mv'].groupby([holdings['holdings.portfolio'],holdings['holdings.date']]).apply(lambda x: x/sum(x) )

工作正常。

前者看起来对我来说有点整洁。我的编码是错误的还是预期的?感谢


下面的CSV数据转储:

',holdings.date,holdings.portfolio,static_data.sector,static_data.industry,holdings.instrument,static_data.name,holdings.position,prices.adjclose,pct_chg,mv\n0,2013-01-14 00:00:00,SP500,Health Care,Health Care Equipment & Services,A,Agilent Technologies Inc,333512000.0,30.61,0.0026203734032099746,10208802320.0\n20072,2013-01-14 00:00:00,SP500,Consumer Discretionary,"Apparel, Accessories & Luxury Goods",RL,Polo Ralph Lauren Corp.,87704000.0,163.35,0.002454740718011772,14326448400.0\n3432,2013-01-14 00:00:00,SP500,Information Technology,Semiconductors,BRCM,Broadcom Corporation,592000000.0,33.74,-0.005599764220453829,19974080000.0\n20020,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Drilling,RIG,Transocean,362189000.0,49.65,-0.0028118096003213466,17982683850.0\n19968,2013-01-14 00:00:00,SP500,Information Technology,Systems Software,RHT,Red Hat Inc.,187822000.0,54.99,0.009917355371900749,10328331780.0\n3484,2013-01-14 00:00:00,usequity,Health Care,Health Care Equipment & Services,BSX,Boston Scientific,849000.0,6.32,-0.0062893081761006275,5365680.0\n19916,2013-01-14 00:00:00,usequity,Industrials,Industrial Conglomerates,RHI,Robert Half International,60000.0,32.28,0.011278195488721776,1936800.0\n3536,2013-01-14 00:00:00,SP500,Consumer Discretionary,Auto Parts & Equipment,BWA,BorgWarner,227373000.0,35.57,0.003668171557562161,8087657610.0\n19864,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,RF,Regions Financial Corp.,1379000000.0,7.06,-0.007032348804500765,9735740000.0\n19812,2013-01-14 00:00:00,SP500,Health Care,Biotechnology,REGN,Regeneron,100390000.0,179.4,-0.00033433634236046395,18009966000.0\n3588,2013-01-14 00:00:00,SP500,Financials,REITs,BXP,Boston Properties,153099000.0,100.68,0.003388479170819192,15414007320.000002\n19760,2013-01-14 00:00:00,SP500,Consumer Staples,Tobacco,RAI,Reynolds American Inc.,531283000.0,39.13,0.0017921146953405742,20789103790.0\n19708,2013-01-14 00:00:00,SP500,Industrials,Industrial Conglomerates,R,Ryder System,53039000.0,51.47,0.0027274498344047604,2729917330.0\n3640,2013-01-14 00:00:00,SP500,Financials,Banks,C,Citigroup Inc.,3029500000.0,42.15,-0.002838892831795725,127693425000.0\n19656,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Exploration & Production,QEP,QEP Resources,180091000.0,29.17,-0.004776526782667934,5253254470.0\n3692,2013-01-14 00:00:00,SP500,Information Technology,Systems Software,CA,"CA, Inc.",444906000.0,22.19,0.009554140127388644,9872464140.0\n19604,2013-01-14 00:00:00,SP500,Information Technology,Semiconductors,QCOM,QUALCOMM Inc.,1676023000.0,62.05,-0.010208964747168592,103997227150.0\n19552,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Exploration & Production,PXD,Pioneer Natural Resources,143098000.0,111.63,-0.0009844281367460406,15974029740.0\n3744,2013-01-14 00:00:00,SP500,Consumer Staples,Packaged Foods & Meats,CAG,ConAgra Foods Inc.,424827000.0,29.21,0.0075888237323216146,12409196670.0\n19500,2013-01-14 00:00:00,SP500,Materials,Industrial Gases,PX,Praxair Inc.,291372000.0,110.15,0.0009086778736937529,32094625800.0\n19448,2013-01-14 00:00:00,SP500,Industrials,Industrial Conglomerates,PWR,Quanta Services Inc.,216795000.0,28.66,-0.012405237767057153,6213344700.0\n3796,2013-01-14 00:00:00,SP500,Health Care,Health Care Distributors & Services,CAH,Cardinal Health Inc.,336000000.0,41.62,0.003133285128946728,13984320000.0\n19396,2013-01-14 00:00:00,SP500,Consumer Discretionary,"Apparel, Accessories & Luxury Goods",PVH,PVH Corp.,82393000.0,117.49,0.002303361201160259,9680353570.0\n3848,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Equipment & Services,CAM,Cameron International Corp.,198303000.0,57.44,-0.0019113814074717128,11390524320.0\n19344,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Refining & Marketing & Transportation,PSX,Phillips 66,553513000.0,49.42,0.015409903431271799,27354612460.0\n19292,2013-01-14 00:00:00,SP500,Financials,REITs,PSA,Public Storage,172418000.0,139.16,-0.005005005005005114,23993688880.0\n20124,2013-01-14 00:00:00,SP500,Industrials,Industrial Conglomerates,ROK,Rockwell Automation Inc.,137872000.0,82.65,-0.0018115942028984477,11395120800.0\n3900,2013-01-14 00:00:00,SP500,Industrials,Construction & Farm Machinery & Heavy Trucks,CAT,Caterpillar Inc.,611500000.0,90.32,-0.005943209333039934,55230679999.99999\n3380,2013-01-14 00:00:00,SP500,Health Care,Health Care Distributors & Services,BMY,Bristol-Myers Squibb,1658776000.0,32.49,0.00277777777777799,53893632240.0\n3328,2013-01-14 00:00:00,SP500,Materials,Paper Packaging,BMS,Bemis Company,99880000.0,33.34,0.008469449485783542,3329999200.0000005\n21008,2013-01-14 00:00:00,SP500,Consumer Discretionary,Broadcasting & Cable TV,SNI,Scripps Networks Interactive Inc.,140122000.0,58.2,-0.011381009002887632,8155100400.0\n2860,2013-01-14 00:00:00,SP500,Consumer Discretionary,Computer & Electronics Retail,BBY,Best Buy Co. Inc.,349615000.0,13.9,0.019061583577712593,4859648500.0\n20956,2013-01-14 00:00:00,SP500,Information Technology,Computer Storage & Peripherals,SNDK,SanDisk Corporation,222201000.0,46.04,0.008985316677624366,10230134040.0\n20904,2013-01-14 00:00:00,SP500,Consumer Discretionary,Household Appliances,SNA,Snap-On Inc.,58107000.0,77.47,0.0014219234746639664,4501549290.0\n2912,2013-01-14 00:00:00,SP500,Health Care,Health Care Equipment & Services,BCR,Bard (C.R.) Inc.,74898000.0,101.28,-0.004423473901503994,7585669440.0\n20852,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Equipment & Services,SLB,Schlumberger Ltd.,1286793000.0,70.8,-0.01324041811846699,91104944400.0\n2964,2013-01-14 00:00:00,SP500,Health Care,Health Care Equipment & Services,BDX,Becton Dickinson,191835000.0,79.49,0.006584779030011312,15248964149.999998\n20800,2013-01-14 00:00:00,SP500,Consumer Staples,Packaged Foods & Meats,SJM,Smucker (J.M.),101817000.0,84.88,0.0009433962264151496,8642226960.0\n20748,2013-01-14 00:00:00,SP500,Materials,Diversified Chemicals,SIAL,Sigma-Aldrich,119085000.0,75.15,0.0009323388385722442,8949237750.0\n3016,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,BEN,Franklin Resources,622900000.0,44.54,-0.0006730984967466824,27743966000.0\n20696,2013-01-14 00:00:00,SP500,Materials,Specialty Chemicals,SHW,Sherwin-Williams,95997000.0,158.08,-0.0006321911746112185,15175205760.000002\n20644,2013-01-14 00:00:00,SP500,Materials,Paper Packaging,SEE,Sealed Air Corp.(New),210399000.0,17.77,0.006228765571913986,3738790230.0\n3068,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Equipment & Services,BHI,Baker Hughes Inc,432598000.0,41.06,-0.023078753271472685,17762473880.0\n20592,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Refining & Marketing & Transportation,SE,Spectra Energy Corp.,670893000.0,25.99,0.0034749034749035346,17436509070.0\n3120,2013-01-14 00:00:00,SP500,Health Care,Biotechnology,BIIB,BIOGEN IDEC Inc.,236155000.0,143.88,0.0006259127894847616,33977981400.0\n20540,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,SCHW,Charles Schwab Corporation,1303355000.0,14.95,-0.0073041168658699585,19485157250.0\n20488,2013-01-14 00:00:00,SP500,Utilities,Multi-Utilities & Unregulated Power,SCG,SCANA Corp,142052000.0,43.04,-0.003934274473501587,6113918080.0\n3172,2013-01-14 00:00:00,SP500,Financials,Banks,BK,The Bank of New York Mellon Corp.,1125709000.0,25.71,-0.002328288707799664,28941978390.0\n20436,2013-01-14 00:00:00,SP500,Consumer Discretionary,Restaurants,SBUX,Starbucks Corp.,749500000.0,53.37,-0.006330292310556707,40000815000.0\n3224,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,BLK,BlackRock,167610000.0,212.71,0.005340769448908267,35652323100.0\n'

1 个答案:

答案 0 :(得分:3)

好的,看看你尝试了什么:

holdings['wt'] = holdings.groupby(['holdings.portfolio','holdings.date']).apply(lambda x: x['mv']/sum(x['mv']) )

此操作失败,因为您在此处进行分组时减少了行数,但尝试分配回原始df并且索引不再兼容。

如果要将某些groupby操作的结果分配回原始df,则应该调用transform

In [174]:

holdings['wt'] = holdings.groupby(['holdings.portfolio','holdings.date'])['mv'].transform(lambda x: x/sum(x))
holdings['wt']

Out[174]:
0        0.009482
20072    0.013306
3432     0.018552
20020    0.016702
19968    0.009593
3484     0.734775
19916    0.265225
3536     0.007512
19864    0.009043
19812    0.016728
3588     0.014317
19760    0.019309
19708    0.002536
3640     0.118602
19656    0.004879
3692     0.009170
19604    0.096593
19552    0.014837
3744     0.011526
19500    0.029810
19448    0.005771
3796     0.012989
19396    0.008991
3848     0.010580
19344    0.025407
19292    0.022285
20124    0.010584
3900     0.051298
3380     0.050057
3328     0.003093
21008    0.007574
2860     0.004514
20956    0.009502
20904    0.004181
2912     0.007046
20852    0.084619
2964     0.014163
20800    0.008027
20748    0.008312
3016     0.025769
20696    0.014095
20644    0.003473
3068     0.016498
20592    0.016195
3120     0.031559
20540    0.018098
20488    0.005679
3172     0.026881
20436    0.037153
3224     0.033114
Name: wt, dtype: float64

你做的另一件事有点奇怪:

holdings['wt'] = holdings['mv'].groupby([holdings['holdings.portfolio'],holdings['holdings.date']]).apply(lambda x: x/sum(x) )

您没有传递列名,而是传递了2个系列的列表,并在列' mv'这是一个系列,这不会创建任何分组,因为没有要分组的列,强制它返回一个索引与原始df兼容的系列。

我们可以测试我的transform方法与上一个方法相同:

In [178]:

holdings['wt'].equals(holdings['mv'].groupby([holdings['holdings.portfolio'],holdings['holdings.date']]).apply(lambda x: x/sum(x) ))
Out[178]:
True