如果我分组(下面的g对象),然后将以下函数应用到前1000行df,它就可以了。但如果我将它应用于整个df,我会得到这个例外:
def calc_load(x):
...: x.sort('log_timestamp')
...: x['time_stddev'] = x['time'].std()
...: x['time_mean'] = x['time'].mean()
...: return x
...:
c=g.apply(calc_load)
---------------------------------------------------------------------------
........
ValueError Traceback (most recent call last)
<ipython-input-262-f2fe1f013907> in <module>()
----> 1 c=g.apply(calc_load)
2215 tuple(map(int, [tot_items] + list(block_shape))),
-> 2216 tuple(map(int, [len(ax) for ax in axes]))))
2217
2218
ValueError: Shape of passed values is (10, 3943482), indices imply (10, 410450)
这里的原因是什么?如何解决?
更新
我正在从HDF5商店阅读此表:
prob2
Out[374]:
<class 'pandas.io.pytables.HDFStore'>
File path: /tmp/test2.h5
/mytable frame_table (typ->appendable,nrows->410450,ncols->8,indexers->[index])
a=prob2.mytable
a
Out[376]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 410450 entries, 0 to 9999
Data columns (total 8 columns):
args 410450 non-null values
host 410450 non-null values
kwargs 410450 non-null values
log_timestamp 410450 non-null values
operation 410450 non-null values
slot 410450 non-null values
status 410450 non-null values
time 410450 non-null values
dtypes: float64(1), int64(2), object(5)
如果我像下面这样往返于CSV,则不会发生异常:
a.to_csv('/tmp/test2.csv')
b=pd.read_csv('/tmp/test2.csv')
b
Out[379]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 410450 entries, 0 to 410449
Data columns (total 9 columns):
Unnamed: 0 410450 non-null values
args 410450 non-null values
host 410450 non-null values
kwargs 410450 non-null values
log_timestamp 410450 non-null values
operation 410450 non-null values
slot 410450 non-null values
status 410450 non-null values
time 410450 non-null values
dtypes: float64(1), int64(3), object(5)
bg = b.groupby(['host','operation'])
bg.apply(calc_load)
Out[381]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 410450 entries, 0 to 410449
Data columns (total 11 columns):
Unnamed: 0 410450 non-null values
args 410450 non-null values
host 410450 non-null values
kwargs 410450 non-null values
log_timestamp 410450 non-null values
operation 410450 non-null values
slot 410450 non-null values
status 410450 non-null values
time 410450 non-null values
time_stddev 410371 non-null values
time_mean 410450 non-null values
dtypes: float64(3), int64(3), object(5)
往返(a)之前和往返(b)之后的数据帧看起来相似,但它们并不相同!
a
Out[386]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 410450 entries, 0 to 9999
Data columns (total 8 columns):
args 410450 non-null values
host 410450 non-null values
kwargs 410450 non-null values
log_timestamp 410450 non-null values
operation 410450 non-null values
slot 410450 non-null values
status 410450 non-null values
time 410450 non-null values
dtypes: float64(1), int64(2), object(5)
b
Out[387]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 410450 entries, 0 to 410449
Data columns (total 9 columns):
Unnamed: 0 410450 non-null values
args 410450 non-null values
host 410450 non-null values
kwargs 410450 non-null values
log_timestamp 410450 non-null values
operation 410450 non-null values
slot 410450 non-null values
status 410450 non-null values
time 410450 non-null values
dtypes: float64(1), int64(3), object(5)
呃,这里发生了什么?
答案 0 :(得分:4)
按主机/操作分组后,您有许多重复项。这就是为什么前1000行探测有效,但整套没有。
首先重置索引,然后分组并应用。您可以通过在结尾处设置索引来恢复原始索引。重置索引变成一个名为'index'的列(然后set_index会丢弃)。
这实际上是一种相当常见的模式。我认为可能会有更有用的错误消息,请参阅here。因为我不确定groupby应该自动修复它(它可以)。因为这可能是用户错误或意图。
In [26]: df = d.reset_index().groupby(['host','operation']).apply(calc_load).set_index('index')
In [27]: df
Out[27]:
args host kwargs log_timestamp operation slot status time time_stddev time_mean
index
0 [] yy3.segm1.org {} 1385984306000000000 x_gWidgboxParams a12yy3 -101 0.000477 0.061657 0.003226
1 [] yy14.segm1.org {} 1385984306000000000 x_initWidgbox a11yy14 1 0.004177 0.035759 0.005816
10 [] yy32.segm1.org {} 1385984307000000000 gSettings a13yy32 -101 0.009686 0.245170 0.070137
100 [] yy19.segm1.org {} 1385984308000000000 notifyTestsDelivered a16yy19 1 0.000766 0.002825 0.000964
1000 [] yy7.segm1.org {} 1385984320000000000 addWidging2 a12yy7 1 0.002576 0.008525 0.004122
10000 [] yy14.segm1.org {} 1385984461000000000 addWidging2 a13yy14 1 0.001317 0.009431 0.003910
10001 [] yy14.segm1.org {} 1385984461000000000 gxyzinf a13yy14 -101 0.000542 0.001861 0.001074
10002 [] yy20.segm1.org {} 1385984461000000000 x_gbinf I502yy20 -101 0.000522 0.001043 0.000743
10003 [] yy20.segm1.org {} 1385984461000000000 setFlagsOneWidg I502yy20 1 0.001660 0.005404 0.002910
10004 [] yy14.segm1.org {} 1385984461000000000 notifyTestsDelivered a13yy14 1 0.000551 0.002877 0.001156
10005 [] yy20.segm1.org {} 1385984461000000000 gxyzinf I502yy20 -101 0.000521 0.000802 0.000813
10006 [] yy14.segm1.org {} 1385984461000000000 addWidging2 a13yy14 1 0.001256 0.009431 0.003910
10007 [] yy14.segm1.org {} 1385984461000000000 gxyzinf a13yy14 -101 0.000414 0.001861 0.001074
10008 [] yy14.segm1.org {} 1385984461000000000 addWidging2 a13yy14 1 0.001222 0.009431 0.003910
10009 [] yy14.segm1.org {} 1385984461000000000 gxyzinf a13yy14 -101 0.000475 0.001861 0.001074
1001 [] yy7.segm1.org {} 1385984320000000000 gxyzinf a12yy7 -101 0.000783 0.003059 0.001004
10010 [] yy14.segm1.org {} 1385984461000000000 x_initWidgbox a12yy14 1 0.002764 0.035759 0.005816
10011 [] yy32.segm1.org {} 1385984461000000000 x_initWidgbox a15yy32 1 0.057966 0.334923 0.147668
10012 [] yy3.segm1.org {} 1385984461000000000 gSettings a11yy3 -101 0.006519 0.163707 0.017649
10013 [] yy30.segm1.org {} 1385984461000000000 gtfull a13yy30 -101 0.003648 0.116366 0.014088
10014 [] yy6.segm1.org {} 1385984461000000000 x_gbinf a16yy6 -101 0.000621 0.005796 0.001139
10015 [] yy34.segm1.org {} 1385984461000000000 gtfull a14yy34 -101 0.002031 0.015581 0.007747
10016 [] yy34.segm1.org {} 1385984461000000000 x_gbinf a14yy34 -101 0.000546 0.002596 0.001899
10017 [] yy34.segm1.org {} 1385984461000000000 setFlagsOneWidg a14yy34 1 0.001358 0.003515 0.005866
10018 [] yy34.segm1.org {} 1385984461000000000 gxyzinf a14yy34 -101 0.000486 0.004446 0.002018
10019 [] yy25.segm1.org {} 1385984461000000000 gtfull a13yy25 -101 0.002029 0.001793 0.002355
1002 [] yy7.segm1.org {} 1385984320000000000 notifyTestsDelivered a12yy7 1 0.000847 0.003748 0.001081
10020 [] yy32.segm1.org {} 1385984462000000000 gFolderId a15yy32 -101 0.018326 0.187434 0.058200
10021 [] yy25.segm1.org {} 1385984462000000000 x_gbinf a13yy25 -101 0.000589 0.001716 0.000830
10022 [] yy25.segm1.org {} 1385984462000000000 updateWidg a13yy25 1 0.003058 0.004660 0.003973
10023 [] yy25.segm1.org {} 1385984462000000000 clearElems a13yy25 1 0.000661 0.004893 0.001687
10024 [] yy10.segm1.org {} 1385984462000000000 gtfull a18yy10 -101 0.002779 0.069679 0.007495
10025 [] yy13.segm1.org {} 1385984462000000000 gtfull a11yy13 -101 0.001978 0.124069 0.012524
10026 [] yy32.segm1.org {} 1385984462000000000 x_gbinf a14yy32 -101 0.018674 0.190657 0.058083
10027 [] yy10.segm1.org {} 1385984462000000000 x_gbinf a18yy10 -101 0.000874 0.007170 0.001606
10028 [] yy32.segm1.org {} 1385984462000000000 gWidgId a14yy32 1 0.014523 1.518315 0.559983
10029 [] yy13.segm1.org {} 1385984462000000000 x_gbinf a11yy13 -101 0.000577 0.008605 0.001130
1003 [] yy7.segm1.org {} 1385984320000000000 x_gWidgboxParams a12yy7 -101 0.000933 0.001084 0.001442
10030 [] yy13.segm1.org {} 1385984462000000000 setFlagsOneWidg a11yy13 1 0.001611 0.011409 0.004093
10031 [] yy13.segm1.org {} 1385984462000000000 gxyzinf a11yy13 -101 0.000575 0.053991 0.003044
10032 [] yy39.segm1.org {} 1385984462000000000 gtfull a13yy39 -101 0.002005 0.034577 0.003504
10033 [] yy39.segm1.org {} 1385984462000000000 x_gbinf a13yy39 -101 0.000539 0.001371 0.000931
10034 [] yy32.segm1.org {} 1385984462000000000 addWidging2 a15yy32 1 0.122369 1.414068 0.441565
10035 [] yy32.segm1.org {} 1385984462000000000 moveOneWidg a12yy32 1 0.468481 1.303089 0.665778
10036 [] yy32.segm1.org {} 1385984462000000000 gxyzinf a15yy32 -101 0.018006 0.155379 0.040389
10037 [] yy32.segm1.org {} 1385984462000000000 notifyTestsDelivered a15yy32 1 0.006874 0.129650 0.032741
10038 [] yy32.segm1.org {} 1385984462000000000 gxyzinf a12yy32 -101 0.016607 0.155379 0.040389
10039 [] yy39.segm1.org {} 1385984462000000000 updateWidg a13yy39 1 0.003879 0.005466 0.006465
1004 [] yy34.segm1.org {} 1385984320000000000 gtfull a11yy34 -101 0.003681 0.015581 0.007747
10040 [] yy39.segm1.org {} 1385984462000000000 SELECT a13yy39 217831 0.000423 0.000126 0.000551
10041 [] yy39.segm1.org {} 1385984462000000000 clearElems a13yy39 1 0.000705 0.002367 0.001356
10042 [] yy3.segm1.org {} 1385984462000000000 moveOneWidg a15yy3 1 0.002660 0.027428 0.009078
10043 [] yy3.segm1.org {} 1385984462000000000 gxyzinf a15yy3 -101 0.000436 0.041627 0.001913
10044 [] yy39.segm1.org {} 1385984462000000000 gSettings a11yy39 -101 0.002237 0.007467 0.002679
10045 [] yy32.segm1.org {} 1385984462000000000 gSettings a15yy32 -101 0.012113 0.245170 0.070137
10046 [] yy32.segm1.org {} 1385984462000000000 x_gWidgboxParams a15yy32 -101 0.030427 0.143941 0.050055
10047 [] yy13.segm1.org {} 1385984462000000000 moveOneWidg a12yy13 1 0.003796 0.117085 0.017910
10048 [] yy13.segm1.org {} 1385984462000000000 gxyzinf a12yy13 -101 0.000521 0.053991 0.003044
10049 [] yy30.segm1.org {} 1385984462000000000 x_gWidgboxParams a13yy30 -101 0.002451 0.051829 0.003644
1005 [] yy12.segm1.org {} 1385984320000000000 gtfull a15yy12 -101 0.003428 0.005479 0.003063
... ... ... ... ... ... ... ... ... ...
[410450 rows x 10 columns]