为什么这个pandas groupby对象拥有所有这些额外的组?

时间:2012-11-28 20:58:19

标签: python pandas

Here是我正在使用的数据的链接。

我尝试根据我想要的列(cuepostargetpossoa)创建一个组,但是当我列出这些组时,它似乎正在创建一些组其他专栏......

groups = t.groupby(['cuepos', 'targetpos', 'soa'])
for name, _ in groups:
    print name

输出:

(-89, -89, -89.41139261318807)
(-88, -88, -88.44728835230345)
(-88, -88, -88.20648583606493)
(-87, -87, -87.77339640061896)
(-86, -86, -86.8637199297012)
(-85, -85, -85.50514526170076)
(-83, -83, -83.87935779179833)
(-83, -83, -83.86953491773222)
(-81, -81, -81.43570407709822)
(-80, -80, -80.70639872201482)
(-80, -80, -80.38454772926528)
(-79, -79, -79.81516051155803)
(-75, -75, -75.83933409447087)
(-74, -74, -74.53528962061156)
(-73, -73, -73.10397238440302)
(-70, -70, -70.33208101764106)
(-64, -64, -64.18024404177129)
(-61, -61, -61.969216551968344)
(-61, -61, -61.89154280549519)
(-61, -61, -61.81223645812457)
(-61, -61, -61.80055105439692)
(-59, -59, -59.81551441456813)
(-57, -57, -57.67934478380107)
(-53, -53, -53.91038834185852)
(-51, -51, -51.35605559139145)
(-48, -48, -48.63443042074468)
(-48, -48, -48.026567177299825)
(-44, -44, -44.84750981999042)
(-44, -44, -44.20816797871376)
(-43, -43, -43.97185684796753)
(-39, -39, -39.03132145644588)
(-37, -37, -37.09246448040565)
(-37, -37, -37.06406445785262)
(-36, -36, -36.89551705610748)
(-34, -34, -34.23312940622742)
(-33, -33, -33.771084303661524)
(-31, -31, -31.183030415916534)
(-29, -29, -29.062383175092265)
(-22, -22, -22.1763042325164)
(-17, -17, -17.51138905398824)
(-14, -14, -14.673170146200675)
(-9, -9, -9.389620131659427)
(-9, -9, -9.28109130634627)
(-8, -8, -8.871025817651997)
(-8, -8, -8.47526860623043)
(-7, -7, -7.484697635519495)
(-3, -3, -3.265563116265213)
(-2, -2, -2.842961251214575)
(1, 1, -0.1)
(1, 1, 0.1)
(1, 1, 0.4)
(1, 2, -0.1)
(1, 2, 0.4)
(2, 1, -0.1)
(2, 1, 0.4)
(2, 2, -0.1)
(2, 2, 0.1)
(2, 2, 0.4)
(6, 6, 6.928400268960042)
(8, 8, 8.476818809273727)
(11, 11, 11.225720357570507)
(13, 13, 13.949059199458294)
(17, 17, 17.272663104264836)
(18, 18, 18.548979295124248)
(21, 21, 21.075945669054835)
(22, 22, 22.101344720547228)
(22, 22, 22.36405009971824)
(24, 24, 24.658480906080996)
(27, 27, 27.977154868918745)
(33, 33, 33.75660016684323)
(49, 49, 49.59296862775889)
(51, 51, 51.09435632596291)
(52, 52, 52.107845391762766)
(54, 54, 54.22026217046835)
(54, 54, 54.55461208382168)
(56, 56, 56.92397800238861)
(57, 57, 57.15634257840432)
(57, 57, 57.490226928649264)
(57, 57, 57.82030543311612)
(58, 58, 58.20496727209113)
(58, 58, 58.44217165553367)
(58, 58, 58.591804845872765)
(58, 58, 58.84514017314996)
(60, 60, 60.15474896731822)
(60, 60, 60.49526399943247)
(60, 60, 60.621239605283456)
(61, 61, 61.73542327989246)
(61, 61, 61.882729155824705)
(63, 63, 63.15716022529575)
(65, 65, 65.62684954629724)
(67, 67, 67.32622273875754)
(68, 68, 68.72997170017184)
(71, 71, 71.64012395084114)
(71, 71, 71.87357582509455)
(71, 71, 71.91237771102328)
(72, 72, 72.87756472051248)
(73, 73, 73.23547239962096)
(75, 75, 75.20111322246554)
(76, 76, 76.37312687962122)
(78, 78, 78.39727821292199)
(79, 79, 79.27674426299386)
(80, 80, 80.22644745900354)
(82, 82, 82.38562004739285)
(82, 82, 82.75922122217577)
(85, 85, 85.19181215842043)
(85, 85, 85.6896980533089)
(85, 85, 85.84141277449113)
(87, 87, 87.21598891172931)
(87, 87, 87.60810304014197)
(87, 87, 87.80910737578778)

所需的组位于中间(看起来像(1, 1, -0.1))。这是什么其他的东西?我在做什么错了,在这里?

1 个答案:

答案 0 :(得分:0)

嗯,数据并不是关于观察到的3元组:

In [9]: df[['cuepos', 'targetpos', 'soa']].drop_duplicates()
Out[9]: 
     cuepos  targetpos        soa
0         2          2   0.400000
1         2          1   0.400000
2         1          1  -0.100000
3         1          1   0.400000
4         1          2  -0.100000
5         1          1   0.100000
8         2          2  -0.100000
12        1          2   0.400000
18        2          2   0.100000
24        2          1  -0.100000
52       85         85  85.689698
77       -3         -3  -3.265563
117     -83        -83 -83.869535
133      11         11  11.225720
26      -88        -88 -88.206486
31      -48        -48 -48.634430
34       63         63  63.157160
55       85         85  85.841413
80      -61        -61 -61.812236
86      -61        -61 -61.891543
89       87         87  87.215989
92       80         80  80.226447
94       58         58  58.204967
126      71         71  71.912378
128      60         60  60.154749
132       8          8   8.476819
139      65         65  65.626850
141      54         54  54.554612
11      -61        -61 -61.800551
39      -33        -33 -33.771084
46       76         76  76.373127
52      -37        -37 -37.064064
55      -44        -44 -44.847510
60      -70        -70 -70.332081
62       61         61  61.735423
63       75         75  75.201113
69       58         58  58.845140
94      -79        -79 -79.815161
109     -29        -29 -29.062383
111     -51        -51 -51.356056
117     -83        -83 -83.879358
123      21         21  21.075946
135     -31        -31 -31.183030
143       6          6   6.928400
4       -17        -17 -17.511389
11       57         57  57.490227
18      -88        -88 -88.447288
36       78         78  78.397278
39      -14        -14 -14.673170
42       52         52  52.107845
49      -87        -87 -87.773396
50       60         60  60.495264
71       33         33  33.756600
74      -61        -61 -61.969217
84       18         18  18.548979
85       -8         -8  -8.475269
98      -59        -59 -59.815514
101     -80        -80 -80.384548
114     -39        -39 -39.031321
119      71         71  71.873576
121     -86        -86 -86.863720
128      68         68  68.729972
130     -34        -34 -34.233129
140      82         82  82.759221
0       -75        -75 -75.839334
15       67         67  67.326223
34      -57        -57 -57.679345
35      -74        -74 -74.535290
42      -48        -48 -48.026567
67       85         85  85.191812
75       72         72  72.877565
80       -7         -7  -7.484698
99       -9         -9  -9.389620
118     -44        -44 -44.208168
130      73         73  73.235472
143      58         58  58.442172
56       22         22  22.364050
67      -85        -85 -85.505145
95       60         60  60.621240
109      54         54  54.220262
111      87         87  87.809107
112     -81        -81 -81.435704
114      71         71  71.640124
119     -22        -22 -22.176304
120      27         27  27.977155
121      56         56  56.923978
128      57         57  57.820305
133      22         22  22.101345
11       61         61  61.882729
13       58         58  58.591805
28       57         57  57.156343
78      -80        -80 -80.706399
80       49         49  49.592969
81      -37        -37 -37.092464
101     -36        -36 -36.895517
124      17         17  17.272663
128      51         51  51.094356
137     -89        -89 -89.411393
140     -64        -64 -64.180244
36       -8         -8  -8.871026
44      -73        -73 -73.103972
47       -9         -9  -9.281091
49       -2         -2  -2.842961
51       87         87  87.608103
85       24         24  24.658481
90      -53        -53 -53.910388
98       82         82  82.385620
120      79         79  79.276744
127     -43        -43 -43.971857
130      13         13  13.949059