' TableIterator'对象没有属性'形状'

时间:2015-06-24 02:14:47

标签: python unit-testing testing pandas pytest

我正在尝试获取一个大型数据帧并将其传递给一个将其切割成块的函数。

因此,测试我所写的内容如下:

def test_get_dataframe(workspace):
dataframe = workspace.get_df('testing_df',True)
assert dataframe.shape[0] == 500000 

其中testing_df是大数据帧,但我得到了这个' TableIterator'对象没有属性'形状'错误。

我正在尝试使用shape方法来测试是否有500k行通过分块返回给我。

任何帮助?

1 个答案:

答案 0 :(得分:1)

我写了一个函数来执行此操作:

def test_chunks(data, chunk):

    store = []

    for idx, count in enumerate(range(0, data.shape[0], chunk)): 
        try:
            test = pd.DataFrame(data.ix[count:count+chunk-1,:])
            assert test.shape[0] == chunk
        except:
            print 'table chunked incorrectly'
        store.append(test)

    print 'table chunked correctly'

    return store

它会返回一个列表,以验证您的表是否正确分割:

In [72]: df = pd.DataFrame(np.random.randn(1000000,1))

In [73]: df
Out[73]: 
               0
0      -1.182168
1      -1.505764
2      -0.162236
3      -0.621377
4       2.341008
5      -1.506157
6      -0.116629
7       0.957325
8       0.367071
9       0.647191
10     -2.419967
11      0.442284
12      0.241475
13      0.171289
14     -0.624512
15     -0.780075
16     -1.627152
17     -0.100081
18     -0.540503
19     -1.126215
20      0.649648
21     -0.812951
22      0.596237
23     -1.413866
24      0.343937
25     -0.767372
26     -1.632577
27     -0.065164
28     -1.239659
29     -0.810848
...          ...
999970 -2.027269
999971 -0.149554
999972  1.217983
999973  0.453195
999974  0.514412
999975  0.151795
999976 -1.170795
999977 -0.945090
999978  1.385541
999979 -1.084080
999980 -0.564011
999981  1.497476
999982 -0.422143
999983  0.989664
999984  1.295070
999985 -0.838345
999986 -1.110576
999987  0.659037
999988 -1.099105
999989 -0.869162
999990  1.147460
999991  1.543114
999992  1.494555
999993 -1.526764
999994  0.025678
999995 -0.247338
999996 -0.985417
999997  0.356573
999998 -0.622785
999999 -0.100821

[1000000 rows x 1 columns]

In [74]: df = pd.DataFrame(np.random.randn(1000000,1))

In [75]: %paste
def test_chunks(data, chunk):

        store = []

        for idx, count in enumerate(range(0, data.shape[0], chunk)): 
                try:
                        test = pd.DataFrame(data.ix[count:count+chunk-1,:])
                        assert test.shape[0] == chunk
                except:
                        print 'table chunked incorrectly'
                store.append(test)

        print 'table chunked correctly'



        return store
## -- End pasted text --

In [76]: test_chunks(df, 500000)
table chunked correctly
Out[76]: 
[               0
 0      -0.770808
 1      -0.941473
 2       0.508013
 3       0.424950
 4       0.101314
 5      -1.154268
 6      -0.932678
 7       0.844011
 8       0.281692
 9       2.376677
 10      0.555523
 11     -0.565176
 12     -0.091829
 13     -1.262907
 14      0.769793
 15     -0.369955
 16     -0.071488
 17     -2.051964
 18      1.101495
 19      0.355003
 20     -0.537814
 21      1.368524
 22     -1.164048
 23     -1.483500
 24      0.737210
 25      0.228551
 26     -1.500423
 27      1.013433
 28      0.722119
 29      0.253644
 ...          ...
 499970  1.266769
 499971  0.594241
 499972  0.210255
 499973  0.730457
 499974 -0.454487
 499975 -0.125958
 499976  0.655793
 499977 -0.169799
 499978 -2.051298
 499979  0.066739
 499980  0.011063
 499981  0.707727
 499982 -1.070386
 499983 -0.875807
 499984 -1.283149
 499985  0.685271
 499986 -0.981217
 499987 -1.978422
 499988 -0.424755
 499989  0.976395
 499990  0.892599
 499991  0.582446
 499992 -2.256608
 499993 -0.915423
 499994  0.080076
 499995  2.350798
 499996 -0.208804
 499997  0.303654
 499998  1.730798
 499999  1.833389

 [500000 rows x 1 columns],                0
 500000  0.232947
 500001  0.335351
 500002 -0.252290
 500003  1.251981
 500004 -0.190665
 500005  1.686744
 500006 -0.398652
 500007 -1.732415
 500008  1.441498
 500009  0.574721
 500010 -1.586857
 500011  0.090962
 500012  0.041795
 500013 -0.074869
 500014 -0.549962
 500015  0.726490
 500016 -2.686839
 500017  1.369451
 500018 -1.947568
 500019 -0.115681
 500020 -0.292935
 500021 -0.535109
 500022 -1.276597
 500023 -1.228783
 500024  0.705259
 500025  0.538611
 500026 -0.100649
 500027 -1.145738
 500028  0.716736
 500029 -0.354400
 ...          ...
 999970 -0.682481
 999971 -0.823475
 999972 -1.144725
 999973  0.305905
 999974 -1.520020
 999975 -0.049710
 999976 -0.171224
 999977 -0.133479
 999978 -0.259963
 999979 -1.618230
 999980 -0.042287
 999981 -1.204132
 999982 -1.195320
 999983  0.343836
 999984 -0.163967
 999985  0.285751
 999986  0.476105
 999987 -0.657065
 999988 -0.259893
 999989 -0.481626
 999990  0.615710
 999991  0.111523
 999992 -0.278765
 999993 -0.597503
 999994 -0.356952
 999995 -0.156546
 999996 -0.082010
 999997 -0.296540
 999998  0.184973
 999999  0.127719

 [500000 rows x 1 columns]]