基于行计数的子集

时间:2018-06-17 13:19:01

标签: pandas numpy subset

我有以下数据集,其形状为:(118, 2)

我想要数据子集。我的目标是以这样一种方式对数据进行子集化,我不必重复以下内容:

removeTotal[['Firms', 'IndustrySsize']][:8]
removeTotal[['Firms', 'IndustrySsize']][8:16]
removeTotal[['Firms', 'IndustrySsize']][24:32]
removeTotal[['Firms', 'IndustrySsize']][32:40]
removeTotal[['Firms', 'IndustrySsize']][40:48]
removeTotal[['Firms', 'IndustrySsize']][48:56]
removeTotal[['Firms', 'IndustrySsize']][56:64]

这就是我想在上面的语法中用n或类似的东西替换8,16,24等数字。

Firms   IndustrySsize
1   3598185 0-4
2   998953  5-9
3   608502  10-19
4   5205640 0-19
5   513179  20-99
6   87563   100-499
7   5806382 0-499
8   19076   500
10  3575290 0-4
11  992281  5-9
12  600551  10-19
13  5168122 0-19
14  503033  20-99
15  85264   100-499
16  5756419 0-499
17  18636   500
19  3532058 0-4
20  978993  5-9
21  592963  10-19
22  5104014 0-19
23  481496  20-99
24  81243   100-499
25  5666753 0-499
26  17671   500
28  3575240 0-4
29  968075  5-9
30  617089  10-19
31  5160404 0-19
32  475125  20-99
33  81773   100-499
... ... ...
99  85304   100-499
100 5640407 0-499
101 17367   500
103 726862  0
104 2669870 1-4
105 1021210 5-9
106 617087  10-19
107 5035029 0-19
108 515977  20-99
109 84385   100-499
110 5635391 0-499
111 17153   500
113 709074  0
114 2680087 1-4
115 1012954 5-9
116 605693  10-19
117 5007808 0-19
118 501848  20-99
119 81347   100-499
120 5591003 0-499
121 16740   500
123 711899  0
124 2664452 1-4
125 1011849 5-9
126 600167  10-19
127 4988367 0-19
128 494357  20-99
129 80075   100-499
130 5562799 0-499
131 16378   500

1 个答案:

答案 0 :(得分:0)

只需使用numpy.split

In [59]: np.split(df, range(8, df.shape[0], 8))
Out[59]:
[         Firms IndustrySsize
 index
 1      3598185           0-4
 2       998953           5-9
 3       608502         10-19
 4      5205640          0-19
 5       513179         20-99
 6        87563       100-499
 7      5806382         0-499
 8        19076           500,          Firms IndustrySsize
 index
 10     3575290           0-4
 11      992281           5-9
 12      600551         10-19
 13     5168122          0-19
 14      503033         20-99
 15       85264       100-499
 16     5756419         0-499
 17       18636           500,          Firms IndustrySsize
 index
 19     3532058           0-4
 20      978993           5-9
 21      592963         10-19
 22     5104014          0-19
 23      481496         20-99
 24       81243       100-499
 25     5666753         0-499
 26       17671           500,          Firms IndustrySsize
 index
 28     3575240           0-4
 29      968075           5-9
 30      617089         10-19
 31     5160404          0-19
 32      475125         20-99
 33       81773       100-499]

range(8, df.shape[0], 8)允许我计算您提到的步骤(8,16,...),直到您的DataFrame结束。