我有以下数据集,其形状为:(118, 2)
我想要数据子集。我的目标是以这样一种方式对数据进行子集化,我不必重复以下内容:
removeTotal[['Firms', 'IndustrySsize']][:8]
removeTotal[['Firms', 'IndustrySsize']][8:16]
removeTotal[['Firms', 'IndustrySsize']][24:32]
removeTotal[['Firms', 'IndustrySsize']][32:40]
removeTotal[['Firms', 'IndustrySsize']][40:48]
removeTotal[['Firms', 'IndustrySsize']][48:56]
removeTotal[['Firms', 'IndustrySsize']][56:64]
这就是我想在上面的语法中用n
或类似的东西替换8,16,24等数字。
Firms IndustrySsize
1 3598185 0-4
2 998953 5-9
3 608502 10-19
4 5205640 0-19
5 513179 20-99
6 87563 100-499
7 5806382 0-499
8 19076 500
10 3575290 0-4
11 992281 5-9
12 600551 10-19
13 5168122 0-19
14 503033 20-99
15 85264 100-499
16 5756419 0-499
17 18636 500
19 3532058 0-4
20 978993 5-9
21 592963 10-19
22 5104014 0-19
23 481496 20-99
24 81243 100-499
25 5666753 0-499
26 17671 500
28 3575240 0-4
29 968075 5-9
30 617089 10-19
31 5160404 0-19
32 475125 20-99
33 81773 100-499
... ... ...
99 85304 100-499
100 5640407 0-499
101 17367 500
103 726862 0
104 2669870 1-4
105 1021210 5-9
106 617087 10-19
107 5035029 0-19
108 515977 20-99
109 84385 100-499
110 5635391 0-499
111 17153 500
113 709074 0
114 2680087 1-4
115 1012954 5-9
116 605693 10-19
117 5007808 0-19
118 501848 20-99
119 81347 100-499
120 5591003 0-499
121 16740 500
123 711899 0
124 2664452 1-4
125 1011849 5-9
126 600167 10-19
127 4988367 0-19
128 494357 20-99
129 80075 100-499
130 5562799 0-499
131 16378 500
答案 0 :(得分:0)
只需使用numpy.split:
In [59]: np.split(df, range(8, df.shape[0], 8))
Out[59]:
[ Firms IndustrySsize
index
1 3598185 0-4
2 998953 5-9
3 608502 10-19
4 5205640 0-19
5 513179 20-99
6 87563 100-499
7 5806382 0-499
8 19076 500, Firms IndustrySsize
index
10 3575290 0-4
11 992281 5-9
12 600551 10-19
13 5168122 0-19
14 503033 20-99
15 85264 100-499
16 5756419 0-499
17 18636 500, Firms IndustrySsize
index
19 3532058 0-4
20 978993 5-9
21 592963 10-19
22 5104014 0-19
23 481496 20-99
24 81243 100-499
25 5666753 0-499
26 17671 500, Firms IndustrySsize
index
28 3575240 0-4
29 968075 5-9
30 617089 10-19
31 5160404 0-19
32 475125 20-99
33 81773 100-499]
range(8, df.shape[0], 8)
允许我计算您提到的步骤(8,16,...),直到您的DataFrame结束。