Pandas Iterrows Row Number&百分比

时间:2017-07-02 13:39:26

标签: python pandas

我正在通过一个包含1000行的数据帧进行迭代。理想情况下,我想知道我的循环的进度 - 即它完成了多少行,完成了总行数的百分比等。

有没有办法可以打印行号,甚至更好,打算行的百分比?

我目前的代码如下。目前,打印它在下面看起来如何显示某种元组/列表,但我需要的只是行号。这可能很简单。

for row in testDF.iterrows():

        print("Currently on row: "+str(row))

理想的印刷回复:

Currently on row 1; Currently iterrated 1% of rows
Currently on row 2; Currently iterrated 2% of rows
Currently on row 3; Currently iterrated 3% of rows
Currently on row 4; Currently iterrated 4% of rows
Currently on row 5; Currently iterrated 5% of rows

3 个答案:

答案 0 :(得分:2)

如果唯一的单调索引(format),0,1,2,...的一种可能解决方案:

for i, row in testDF.iterrows():
        print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100))

样品:

np.random.seed(1332)
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3)))
print (testDF)
   0  1  2
0  8  1  9
1  4  3  5
2  0  1  3
3  1  8  6
4  7  4  7
5  7  5  3
6  7  9  9
7  0  1  2
8  1  3  4
9  0  0  3

for i, row in testDF.iterrows():
        print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100))
Currently on row: 0; Currently iterrated 10.0% of rows
Currently on row: 1; Currently iterrated 20.0% of rows
Currently on row: 2; Currently iterrated 30.0% of rows
Currently on row: 3; Currently iterrated 40.0% of rows
Currently on row: 4; Currently iterrated 50.0% of rows
Currently on row: 5; Currently iterrated 60.0% of rows
Currently on row: 6; Currently iterrated 70.0% of rows
Currently on row: 7; Currently iterrated 80.0% of rows
Currently on row: 8; Currently iterrated 90.0% of rows
Currently on row: 9; Currently iterrated 100.0% of rows

编辑:

如果某些自定义索引值,zipnumpy.arange length of index的解决方案与length of df的内容相同,则为np.random.seed(1332) testDF = pd.DataFrame(np.random.randint(10, size=(10, 3)), index=[2,4,5,6,7,8,2,1,3,5]) print (testDF) 0 1 2 2 8 1 9 4 4 3 5 5 0 1 3 6 1 8 6 7 7 4 7 8 7 5 3 2 7 9 9 1 0 1 2 3 1 3 4 5 0 0 3 for i, (idx, row) in zip(np.arange(len(testDF.index)), testDF.iterrows()): print("Currently on row: {}; Currently iterrated {}% of rows".format(idx, (i + 1)/len(testDF.index) * 100)) Currently on row: 2; Currently iterrated 10.0% of rows Currently on row: 4; Currently iterrated 20.0% of rows Currently on row: 5; Currently iterrated 30.0% of rows Currently on row: 6; Currently iterrated 40.0% of rows Currently on row: 7; Currently iterrated 50.0% of rows Currently on row: 8; Currently iterrated 60.0% of rows Currently on row: 2; Currently iterrated 70.0% of rows Currently on row: 1; Currently iterrated 80.0% of rows Currently on row: 3; Currently iterrated 90.0% of rows Currently on row: 5; Currently iterrated 100.0% of rows

self.view

答案 1 :(得分:2)

首先iterrows给出了(index, row)的元组。所以正确的代码是

for index, row in testDF.iterrows():

一般情况下索引不是一个行数,它是一些标识符(这是大熊猫的力量,但它会产生一些混淆,因为它在python中的行为不像普通list那里索引是数字排)。这就是我们需要独立计算行数的原因。我们可以引入line_number = 0并在每个cirlce line_number += 1中增加它。但是python为我们提供了一个现成的工具:enumerate,它返回(line_number, value)而不是value的元组。所以我们来看看那段代码

for (line_number, (index, row)) in enumerate(testDF.iterrows()):
    print("Currently on row: {}; Currently iterrated {}% of rows".format(
          line_number, 100*(line_number + 1)/len(testDF)))

P.S。当你设置integeres时,python2会返回整数,这就是为什么999/1000 = 0,你不期望的。因此,您可以采用更好的浮动或将100*带到开头以获得整数百分比。

答案 2 :(得分:2)

对于大数据帧,限制打印可能更好,这是一项耗时的任务。这是一种方法:

ERROR in ./~/mobx-router/src/route.js
ERROR in ./~/mobx-router/src/start-router.js
ERROR in ./~/mobx-router/src/components/MobxRouter.js
ERROR in ./~/mobx-router/src/router-store.js
ERROR in ./~/mobx-router/src/components/Link.js
ERROR in ./~/bstrap-react-mobx-admin/input/text.js
ERROR in ./~/bstrap-react-mobx-admin/input/date.js
ERROR in ./~/bstrap-react-mobx-admin/input/select.js
ERROR in ./~/bstrap-react-mobx-admin/view/edit.js
.
.
.
ERROR in ./~/bstrap-react-mobx-admin/field/bool.js
ERROR in ./~/react-mobx-admin/state/data_table.js