我正在通过一个包含1000行的数据帧进行迭代。理想情况下,我想知道我的循环的进度 - 即它完成了多少行,完成了总行数的百分比等。
有没有办法可以打印行号,甚至更好,打算行的百分比?
我目前的代码如下。目前,打印它在下面看起来如何显示某种元组/列表,但我需要的只是行号。这可能很简单。
for row in testDF.iterrows():
print("Currently on row: "+str(row))
理想的印刷回复:
Currently on row 1; Currently iterrated 1% of rows
Currently on row 2; Currently iterrated 2% of rows
Currently on row 3; Currently iterrated 3% of rows
Currently on row 4; Currently iterrated 4% of rows
Currently on row 5; Currently iterrated 5% of rows
答案 0 :(得分:2)
如果唯一的单调索引(format
),0,1,2,...
的一种可能解决方案:
for i, row in testDF.iterrows():
print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100))
样品:
np.random.seed(1332)
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3)))
print (testDF)
0 1 2
0 8 1 9
1 4 3 5
2 0 1 3
3 1 8 6
4 7 4 7
5 7 5 3
6 7 9 9
7 0 1 2
8 1 3 4
9 0 0 3
for i, row in testDF.iterrows():
print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100))
Currently on row: 0; Currently iterrated 10.0% of rows
Currently on row: 1; Currently iterrated 20.0% of rows
Currently on row: 2; Currently iterrated 30.0% of rows
Currently on row: 3; Currently iterrated 40.0% of rows
Currently on row: 4; Currently iterrated 50.0% of rows
Currently on row: 5; Currently iterrated 60.0% of rows
Currently on row: 6; Currently iterrated 70.0% of rows
Currently on row: 7; Currently iterrated 80.0% of rows
Currently on row: 8; Currently iterrated 90.0% of rows
Currently on row: 9; Currently iterrated 100.0% of rows
编辑:
如果某些自定义索引值,zip
和numpy.arange
length of index
的解决方案与length of df
的内容相同,则为np.random.seed(1332)
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3)), index=[2,4,5,6,7,8,2,1,3,5])
print (testDF)
0 1 2
2 8 1 9
4 4 3 5
5 0 1 3
6 1 8 6
7 7 4 7
8 7 5 3
2 7 9 9
1 0 1 2
3 1 3 4
5 0 0 3
for i, (idx, row) in zip(np.arange(len(testDF.index)), testDF.iterrows()):
print("Currently on row: {}; Currently iterrated {}% of rows".format(idx, (i + 1)/len(testDF.index) * 100))
Currently on row: 2; Currently iterrated 10.0% of rows
Currently on row: 4; Currently iterrated 20.0% of rows
Currently on row: 5; Currently iterrated 30.0% of rows
Currently on row: 6; Currently iterrated 40.0% of rows
Currently on row: 7; Currently iterrated 50.0% of rows
Currently on row: 8; Currently iterrated 60.0% of rows
Currently on row: 2; Currently iterrated 70.0% of rows
Currently on row: 1; Currently iterrated 80.0% of rows
Currently on row: 3; Currently iterrated 90.0% of rows
Currently on row: 5; Currently iterrated 100.0% of rows
:
self.view
答案 1 :(得分:2)
首先iterrows
给出了(index, row)
的元组。所以正确的代码是
for index, row in testDF.iterrows():
一般情况下索引不是一个行数,它是一些标识符(这是大熊猫的力量,但它会产生一些混淆,因为它在python中的行为不像普通list
那里索引是数字排)。这就是我们需要独立计算行数的原因。我们可以引入line_number = 0
并在每个cirlce line_number += 1
中增加它。但是python为我们提供了一个现成的工具:enumerate
,它返回(line_number, value)
而不是value
的元组。所以我们来看看那段代码
for (line_number, (index, row)) in enumerate(testDF.iterrows()):
print("Currently on row: {}; Currently iterrated {}% of rows".format(
line_number, 100*(line_number + 1)/len(testDF)))
P.S。当你设置integeres时,python2会返回整数,这就是为什么999/1000 = 0,你不期望的。因此,您可以采用更好的浮动或将100*
带到开头以获得整数百分比。
答案 2 :(得分:2)
对于大数据帧,限制打印可能更好,这是一项耗时的任务。这是一种方法:
ERROR in ./~/mobx-router/src/route.js
ERROR in ./~/mobx-router/src/start-router.js
ERROR in ./~/mobx-router/src/components/MobxRouter.js
ERROR in ./~/mobx-router/src/router-store.js
ERROR in ./~/mobx-router/src/components/Link.js
ERROR in ./~/bstrap-react-mobx-admin/input/text.js
ERROR in ./~/bstrap-react-mobx-admin/input/date.js
ERROR in ./~/bstrap-react-mobx-admin/input/select.js
ERROR in ./~/bstrap-react-mobx-admin/view/edit.js
.
.
.
ERROR in ./~/bstrap-react-mobx-admin/field/bool.js
ERROR in ./~/react-mobx-admin/state/data_table.js