我正在尝试获取一个大型数据帧并将其传递给一个将其切割成块的函数。
因此,测试我所写的内容如下:
def test_get_dataframe(workspace):
dataframe = workspace.get_df('testing_df',True)
assert dataframe.shape[0] == 500000
其中testing_df是大数据帧,但我得到了这个' TableIterator'对象没有属性'形状'错误。
我正在尝试使用shape方法来测试是否有500k行通过分块返回给我。
任何帮助?
答案 0 :(得分:1)
我写了一个函数来执行此操作:
def test_chunks(data, chunk):
store = []
for idx, count in enumerate(range(0, data.shape[0], chunk)):
try:
test = pd.DataFrame(data.ix[count:count+chunk-1,:])
assert test.shape[0] == chunk
except:
print 'table chunked incorrectly'
store.append(test)
print 'table chunked correctly'
return store
它会返回一个列表,以验证您的表是否正确分割:
In [72]: df = pd.DataFrame(np.random.randn(1000000,1))
In [73]: df
Out[73]:
0
0 -1.182168
1 -1.505764
2 -0.162236
3 -0.621377
4 2.341008
5 -1.506157
6 -0.116629
7 0.957325
8 0.367071
9 0.647191
10 -2.419967
11 0.442284
12 0.241475
13 0.171289
14 -0.624512
15 -0.780075
16 -1.627152
17 -0.100081
18 -0.540503
19 -1.126215
20 0.649648
21 -0.812951
22 0.596237
23 -1.413866
24 0.343937
25 -0.767372
26 -1.632577
27 -0.065164
28 -1.239659
29 -0.810848
... ...
999970 -2.027269
999971 -0.149554
999972 1.217983
999973 0.453195
999974 0.514412
999975 0.151795
999976 -1.170795
999977 -0.945090
999978 1.385541
999979 -1.084080
999980 -0.564011
999981 1.497476
999982 -0.422143
999983 0.989664
999984 1.295070
999985 -0.838345
999986 -1.110576
999987 0.659037
999988 -1.099105
999989 -0.869162
999990 1.147460
999991 1.543114
999992 1.494555
999993 -1.526764
999994 0.025678
999995 -0.247338
999996 -0.985417
999997 0.356573
999998 -0.622785
999999 -0.100821
[1000000 rows x 1 columns]
In [74]: df = pd.DataFrame(np.random.randn(1000000,1))
In [75]: %paste
def test_chunks(data, chunk):
store = []
for idx, count in enumerate(range(0, data.shape[0], chunk)):
try:
test = pd.DataFrame(data.ix[count:count+chunk-1,:])
assert test.shape[0] == chunk
except:
print 'table chunked incorrectly'
store.append(test)
print 'table chunked correctly'
return store
## -- End pasted text --
In [76]: test_chunks(df, 500000)
table chunked correctly
Out[76]:
[ 0
0 -0.770808
1 -0.941473
2 0.508013
3 0.424950
4 0.101314
5 -1.154268
6 -0.932678
7 0.844011
8 0.281692
9 2.376677
10 0.555523
11 -0.565176
12 -0.091829
13 -1.262907
14 0.769793
15 -0.369955
16 -0.071488
17 -2.051964
18 1.101495
19 0.355003
20 -0.537814
21 1.368524
22 -1.164048
23 -1.483500
24 0.737210
25 0.228551
26 -1.500423
27 1.013433
28 0.722119
29 0.253644
... ...
499970 1.266769
499971 0.594241
499972 0.210255
499973 0.730457
499974 -0.454487
499975 -0.125958
499976 0.655793
499977 -0.169799
499978 -2.051298
499979 0.066739
499980 0.011063
499981 0.707727
499982 -1.070386
499983 -0.875807
499984 -1.283149
499985 0.685271
499986 -0.981217
499987 -1.978422
499988 -0.424755
499989 0.976395
499990 0.892599
499991 0.582446
499992 -2.256608
499993 -0.915423
499994 0.080076
499995 2.350798
499996 -0.208804
499997 0.303654
499998 1.730798
499999 1.833389
[500000 rows x 1 columns], 0
500000 0.232947
500001 0.335351
500002 -0.252290
500003 1.251981
500004 -0.190665
500005 1.686744
500006 -0.398652
500007 -1.732415
500008 1.441498
500009 0.574721
500010 -1.586857
500011 0.090962
500012 0.041795
500013 -0.074869
500014 -0.549962
500015 0.726490
500016 -2.686839
500017 1.369451
500018 -1.947568
500019 -0.115681
500020 -0.292935
500021 -0.535109
500022 -1.276597
500023 -1.228783
500024 0.705259
500025 0.538611
500026 -0.100649
500027 -1.145738
500028 0.716736
500029 -0.354400
... ...
999970 -0.682481
999971 -0.823475
999972 -1.144725
999973 0.305905
999974 -1.520020
999975 -0.049710
999976 -0.171224
999977 -0.133479
999978 -0.259963
999979 -1.618230
999980 -0.042287
999981 -1.204132
999982 -1.195320
999983 0.343836
999984 -0.163967
999985 0.285751
999986 0.476105
999987 -0.657065
999988 -0.259893
999989 -0.481626
999990 0.615710
999991 0.111523
999992 -0.278765
999993 -0.597503
999994 -0.356952
999995 -0.156546
999996 -0.082010
999997 -0.296540
999998 0.184973
999999 0.127719
[500000 rows x 1 columns]]